Policy gradient in Lipschitz Markov decision processes
From MaRDI portal
Publication:747252
DOI10.1007/s10994-015-5484-1zbMath1354.90166OpenAlexW2046859786MaRDI QIDQ747252
Matteo Pirotta, Luca Bascetta, Marcello Restelli
Publication date: 23 October 2015
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10994-015-5484-1
Related Items
Unnamed Item ⋮ A Small Gain Analysis of Single Timescale Actor Critic ⋮ Smoothing policies and safe policy gradients ⋮ Risk-averse optimization of reward-based coherent risk measures ⋮ On the sample complexity of actor-critic method for reinforcement learning with function approximation ⋮ Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies ⋮ Unnamed Item
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Policy search for motor primitives in robotics
- Lipschitz continuity of value functions in Markovian decision processes
- Stochastic optimal control. The discrete time case
- Minimization of functions having Lipschitz continuous first partial derivatives
- Collective motions of a shell structure
- Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
- Line search algorithms with guaranteed sufficient decrease
- A Stochastic Approximation Method
- Solving connection and linearization problems within the Askey scheme and its \(q\)-analogue via inversion formulas
This page was built for publication: Policy gradient in Lipschitz Markov decision processes