Instance-Dependent ℓ∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning
From MaRDI portal
Publication:5151732
DOI10.1109/TIT.2020.3027316zbMath1473.62082arXiv1909.08749OpenAlexW3090442079MaRDI QIDQ5151732
Martin J. Wainwright, Ashwin Pananjady
Publication date: 22 February 2021
Published in: IEEE Transactions on Information Theory (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1909.08749
Point estimation (62F10) Markov processes: estimation; hidden Markov models (62M05) Stochastic approximation (62L20) Markov and semi-Markov decision processes (90C40)
Related Items (4)
Accelerated and Instance-Optimal Policy Evaluation with Linear Function Approximation ⋮ Softmax policy gradient methods can take exponential time to converge ⋮ Settling the sample complexity of model-based offline reinforcement learning ⋮ Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
This page was built for publication: Instance-Dependent ℓ∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning