Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies
From MaRDI portal
Publication:5139670
DOI10.1137/19M1288012zbMath1451.93379arXiv1906.08383OpenAlexW3109546547MaRDI QIDQ5139670
Tamer Başar, Alec Koppel, Kaiqing Zhang, Hao Zhu
Publication date: 10 December 2020
Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1906.08383
Optimal stochastic control (93E20) Stochastic learning and adaptive control (93E35) Stochastic systems in control theory (general) (93E03)
Related Items
Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms, A Stochastic Trust-Region Framework for Policy Optimization, A Small Gain Analysis of Single Timescale Actor Critic, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization, Softmax policy gradient methods can take exponential time to converge, On the sample complexity of actor-critic method for reinforcement learning with function approximation, Geometry and convergence of natural policy gradient methods, Recent advances in reinforcement learning in finance, Enhance load forecastability: optimize data sampling policy by reinforcing user behaviors, Smoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpoint, Multi-agent reinforcement learning: a selective overview of theories and algorithms
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
- Policy gradient in Lipschitz Markov decision processes
- Nonconvergence to unstable points in urn models and stochastic approximations
- Stochastic approximation. A dynamical systems viewpoint.
- Natural actor-critic algorithms
- Introductory lectures on convex optimization. A basic course.
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Newton-type methods for non-convex optimization under inexact Hessian information
- On the convergence properties of non-Euclidean extragradient methods for variational inequalities with generalized monotone operators
- Cubic regularization of Newton method and its global performance
- Lectures on Stochastic Programming
- Numerical Optimization
- OnActor-Critic Algorithms
- Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
- Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- On the Convergence of Mirror Descent beyond Stochastic Convex Programming