Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration
From MaRDI portal
Publication:5378202
DOI10.1162/NECO_a_00452zbMath1414.68090arXiv1301.3966OpenAlexW2133224499WikidataQ47904761 ScholiaQ47904761MaRDI QIDQ5378202
Voot Tangkaratt, Tingting Zhao, Masashi Sugiyama, Jun Morimoto, Hirotaka Hachiya
Publication date: 12 June 2019
Published in: Neural Computation (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1301.3966
Learning and adaptive systems in artificial intelligence (68T05) Artificial intelligence for robotics (68T40)
Related Items
Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation, Model-based reinforcement learning with dimension reduction
Cites Work
- Analysis and improvement of policy gradient estimation
- Approximate dynamic programming with a fuzzy parameterization
- Improving predictive inference under covariate shift by weighting the log-likelihood function
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item