Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes
From MaRDI portal
Publication:2687069
DOI10.1007/s10107-022-01816-5OpenAlexW3127686539WikidataQ114852452 ScholiaQ114852452MaRDI QIDQ2687069
Publication date: 1 March 2023
Published in: Mathematical Programming. Series A. Series B (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2102.00135
Nonlinear programming (90C30) Stochastic programming (90C15) Markov and semi-Markov decision processes (90C40) Artificial intelligence (68Txx) Stochastic systems and control (93Exx)
Related Items (3)
Softmax policy gradient methods can take exponential time to converge ⋮ Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence ⋮ Accelerating Primal-Dual Methods for Regularized Markov Decision Processes
Cites Work
- Unnamed Item
- Unnamed Item
- First-order and stochastic optimization methods for machine learning
- On the convergence properties of non-Euclidean extragradient methods for variational inequalities with generalized monotone operators
- Online Markov Decision Processes
- Functional Approximations and Dynamic Programming
- Robust Stochastic Approximation Approach to Stochastic Programming
- High-Dimensional Probability
- Finite-Dimensional Variational Inequalities and Complementarity Problems
This page was built for publication: Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes