Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes
From MaRDI portal
Publication:6359420
DOI10.1007/s10107-022-01816-5zbMath1512.90150arXiv2102.00135WikidataQ114852452 ScholiaQ114852452MaRDI QIDQ6359420
Publication date: 29 January 2021
Nonlinear programming (90C30) Stochastic programming (90C15) Markov and semi-Markov decision processes (90C40)
This page was built for publication: Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes