Momentum-Based Policy Gradient Methods

From MaRDI portal
Publication:6345011

arXiv2007.06680MaRDI QIDQ6345011

Author name not available (Why is that?)

Publication date: 13 July 2020

Abstract: In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of O(epsilon3) for finding an epsilon-stationary point of the non-concave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of O(epsilon3) without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.




Has companion code repository: https://github.com/gaosh/MBPG








This page was built for publication: Momentum-Based Policy Gradient Methods

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6345011)