An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
From MaRDI portal
Publication:616967
DOI10.1016/j.sysconle.2010.08.013zbMath1209.90344OpenAlexW2161270100MaRDI QIDQ616967
Publication date: 12 January 2011
Published in: Systems \& Control Letters (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.sysconle.2010.08.013
function approximationactor-critic algorithmconstrained Markov decision processesinfinite horizon discounted cost criterionsimultaneous perturbation stochastic approximation
Related Items
Constrained Markov decision processes with first passage criteria, Dimension reduction based adaptive dynamic programming for optimal control of discrete-time nonlinear control-affine systems, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Variance-constrained actor-critic algorithms for discounted and average reward MDPs, Recent advances in reinforcement learning in finance, An online actor-critic algorithm with function approximation for constrained Markov decision processes, Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies, Smoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpoint, Risk-Constrained Reinforcement Learning with Percentile Risk Criteria, The Borkar-Meyn theorem for asynchronous stochastic approximations
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Natural actor-critic algorithms
- An actor-critic algorithm for constrained Markov decision processes
- Optimal flow control of a class of queueing networks in equilibrium
- Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
- An analysis of temporal-difference learning with function approximation
- OnActor-Critic Algorithms
- A Kiefer-Wolfowitz algorithm with randomized differences
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- Perturbation theory and finite Markov chains