Policy-based Primal-Dual Methods for Convex Constrained Markov Decision Processes

From MaRDI portal
Publication:6399799

arXiv2205.10715MaRDI QIDQ6399799

Author name not available (Why is that?)

Publication date: 21 May 2022

Abstract: We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and the constraints are convex in the state-action occupancy measure. We propose a policy-based primal-dual algorithm that updates the primal variable via policy gradient ascent and updates the dual variable via projected sub-gradient descent. Despite the loss of additivity structure and the nonconvex nature, we establish the global convergence of the proposed algorithm by leveraging a hidden convexity in the problem, and prove the mathcalOleft(T1/3ight) convergence rate in terms of both optimality gap and constraint violation. When the objective is strongly concave in the occupancy measure, we prove an improved convergence rate of mathcalOleft(T1/2ight). By introducing a pessimistic term to the constraint, we further show that a zero constraint violation can be achieved while preserving the same convergence rate for the optimality gap. This work is the first one in the literature that establishes non-asymptotic convergence guarantees for policy-based primal-dual methods for solving infinite-horizon discounted convex CMDPs.




Has companion code repository: https://github.com/hyunin-lee/vr-pdpg








This page was built for publication: Policy-based Primal-Dual Methods for Convex Constrained Markov Decision Processes

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6399799)