Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
From MaRDI portal
Publication:5162625
DOI10.1137/20M1331524OpenAlexW3203759272MaRDI QIDQ5162625
Martin J. Wainwright, Koulik Khamaru, Michael I. Jordan, Ashwin Pananjady, Feng Ruan
Publication date: 3 November 2021
Published in: SIAM Journal on Mathematics of Data Science (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2003.07337
Analysis of algorithms and problem complexity (68Q25) Graph theory (including graph drawing) in computer science (68R10) Computer graphics; computational geometry (digital and algorithmic aspects) (68U05)
Related Items
Accelerated and Instance-Optimal Policy Evaluation with Linear Function Approximation, Softmax policy gradient methods can take exponential time to converge
Cites Work
- Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
- Near-optimal PAC bounds for discounted MDPs
- Stochastic approximation. A dynamical systems viewpoint.
- Geometrizing rates of convergence. II
- Geometrizing rates of convergence. III
- New method of stochastic approximation type
- Asymptotics in statistics. Some basic concepts.
- An adaptation theory for nonparametric confidence intervals
- A Framework For Estimation Of Convex Functions
- On the Almost Sure Rate of Convergence of Linear Stochastic Approximation Algorithms
- Robust Stochastic Approximation Approach to Stochastic Programming
- Acceleration of Stochastic Approximation by Averaging
- Asymptotic Statistics
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- High-Dimensional Statistics
- [https://portal.mardi4nfdi.de/wiki/Publication:4743580 Approximation dans les espaces m�triques et th�orie de l'estimation]
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation
- A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation
- Instance-Dependent ℓ∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning
- A Stochastic Approximation Method
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item