Homotopic policy mirror descent: policy convergence, algorithmic regularization, and improved sample complexity
From MaRDI portal
Publication:6608040
DOI10.1007/s10107-023-02017-4MaRDI QIDQ6608040
Yan Li, Guanghui Lan, Tuo Zhao
Publication date: 19 September 2024
Published in: Mathematical Programming. Series A. Series B (Search for Journal in Brave)
Analysis of algorithms and problem complexity (68Q25) Nonconvex programming, global optimization (90C26) Stochastic programming (90C15) Markov and semi-Markov decision processes (90C40)
This page was built for publication: Homotopic policy mirror descent: policy convergence, algorithmic regularization, and improved sample complexity