Settling the sample complexity of model-based offline reinforcement learning
From MaRDI portal
Publication:6192326
DOI10.1214/23-aos2342arXiv2204.05275MaRDI QIDQ6192326
Yuting Wei, Yuejie Chi, Laixi Shi, Unnamed Author, Yuxin Chen
Publication date: 11 March 2024
Published in: The Annals of Statistics (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2204.05275
Markov decision processminimax optimalitysample complexitydistribution shiftoffline reinforcement learning
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
- Asymptotically efficient adaptive allocation rules
- \({\mathcal Q}\)-learning
- Error bounds for constant step-size \(Q\)-learning
- High-Dimensional Statistics
- Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
- Bandit Algorithms
- Instance-Dependent ℓ∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning
- Performance Bounds in $L_p$‐norm for Approximate Value Iteration
- A Stochastic Approximation Method
This page was built for publication: Settling the sample complexity of model-based offline reinforcement learning