Smoothing policies and safe policy gradients
From MaRDI portal
Publication:6097096
DOI10.1007/s10994-022-06232-6arXiv1905.03231OpenAlexW2944187456MaRDI QIDQ6097096
Matteo Pirotta, Matteo Papini, Marcello Restelli
Publication date: 12 June 2023
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1905.03231
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Analysis and improvement of policy gradient estimation
- Policy gradient in Lipschitz Markov decision processes
- Stochastic optimal control. The discrete time case
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Compatible natural gradient policy search
- Probability theory. A comprehensive course.
- Discounted Markov decision processes with utility constraints
- Approximate policy iteration: a survey and some new methods
- OnActor-Critic Algorithms
- Online portfolio selection
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
- A Stochastic Approximation Method
This page was built for publication: Smoothing policies and safe policy gradients