On the sample complexity of actor-critic method for reinforcement learning with function approximation
From MaRDI portal
Publication:6134324
DOI10.1007/s10994-023-06303-2arXiv1910.08412OpenAlexW2981237928MaRDI QIDQ6134324
Alec Koppel, Harshat Kumar, Alejandro Ribeiro
Publication date: 22 August 2023
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1910.08412
stochastic programmingMarkov decision processreinforcement learningnon-convex optimizationactor-critic
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions
- Markov chains and stochastic stability
- Policy gradient in Lipschitz Markov decision processes
- Natural actor-critic algorithms
- Asynchronous stochastic approximation and Q-learning
- Stochastic approximation with two time scales
- \({\mathcal Q}\)-learning
- TD-regularized actor-critic methods
- OnActor-Critic Algorithms
- 10.1162/153244302760200704
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- Policy Evaluation in Continuous MDPs With Efficient Kernelized Gradient Temporal Difference
- A Concentration Bound for Stochastic Approximation via Alekseev’s Formula
- Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies
- Approximate Dynamic Programming
- The theory of dynamic programming
- Approximation by superpositions of a sigmoidal function
- A Small Gain Analysis of Single Timescale Actor Critic