Maximum a Posteriori Policy Optimisation
From MaRDI portal
Publication:6303189
arXiv1806.06920MaRDI QIDQ6303189
Remi Munos, Abbas Abdolmaleki, Jost Tobias Springenberg, Martin Riedmiller, Nicolas Heess, Yuval Tassa
Publication date: 14 June 2018
Abstract: We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show that several existing methods can directly be related to our derivation. We develop two off-policy algorithms and demonstrate that they are competitive with the state-of-the-art in deep reinforcement learning. In particular, for continuous control, our method outperforms existing methods with respect to sample efficiency, premature convergence and robustness to hyperparameter settings while achieving similar or better final performance.
Has companion code repository: https://github.com/MotorCityCobra/C_plusplus_mpo
This page was built for publication: Maximum a Posteriori Policy Optimisation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6303189)