Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients
From MaRDI portal
Publication:5004371
DOI10.1162/neco_a_01387OpenAlexW3158799570MaRDI QIDQ5004371
Filipe Mutz, Paulo Rauber, Jürgen Schmidhuber, Avinash Ummadisingu
Publication date: 30 July 2021
Published in: Neural Computation (Search for Journal in Brave)
Full work available at URL: https://qmro.qmul.ac.uk/xmlui/handle/123456789/72285
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Model-based contextual policy search for data-efficient generalization of robot skills
- Q( $$\lambda $$ ) with Off-Policy Corrections
- Overcoming catastrophic forgetting in neural networks
This page was built for publication: Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients