Momentum-Based Policy Gradient Methods

arXiv2007.06680MaRDI QIDQ6345011

Author name not available (Why is that?)

Publication date: 13 July 2020

Abstract: In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of

O (e p s i l o n^{- 3})

for finding an

e p s i l o n

-stationary point of the non-concave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of

O (e p s i l o n^{- 3})

without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.

Has companion code repository: https://github.com/gaosh/MBPG

This page was built for publication: Momentum-Based Policy Gradient Methods

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6345011)