Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Sample Efficient Policy Gradient Methods with Recursive Variance Reduction - MaRDI portal

Deprecated: Use of MediaWiki\Skin\SkinTemplate::injectLegacyMenusIntoPersonalTools was deprecated in Please make sure Skin option menus contains `user-menu` (and possibly `notifications`, `user-interface-preferences`, `user-page`) 1.46. [Called from MediaWiki\Skin\SkinTemplate::getPortletsTemplateData in /var/www/html/w/includes/Skin/SkinTemplate.php at line 691] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of QuickTemplate::(get/html/text/haveData) with parameter `personal_urls` was deprecated in MediaWiki Use content_navigation instead. [Called from MediaWiki\Skin\QuickTemplate::get in /var/www/html/w/includes/Skin/QuickTemplate.php at line 131] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

From MaRDI portal
Publication:6325577

arXiv1909.08610MaRDI QIDQ6325577

Author name not available (Why is that?)

Publication date: 18 September 2019

Abstract: Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires O(1/epsilon3/2) episodes to find an epsilon-approximate stationary point of the nonconcave performance function (i.e., such that ). This sample complexity improves the existing result O(1/epsilon5/3) for stochastic variance reduced policy gradient algorithms by a factor of O(1/epsilon1/6). In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.




Has companion code repository: https://github.com/xgfelicia/SRVRPG








This page was built for publication: Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6325577)