New Versions of Gradient Temporal-Difference Learning

DOI10.1109/TAC.2022.3213763arXiv2109.04033MaRDI QIDQ6093230

Author name not available (Why is that?)

Publication date: 6 October 2023

Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)

Abstract: Sutton, Szepesv'{a}ri and Maei introduced the first gradient temporal-difference (GTD) learning algorithms compatible with both linear function approximation and off-policy training. The goal of this paper is (a) to propose some variants of GTDs with extensive comparative analysis and (b) to establish new theoretical analysis frameworks for the GTDs. These variants are based on convex-concave saddle-point interpretations of GTDs, which effectively unify all the GTDs into a single framework, and provide simple stability analysis based on recent results on primal-dual gradient dynamics. Finally, numerical comparative analysis is given to evaluate these approaches.

Full work available at URL: https://arxiv.org/abs/2109.04033

zbMATH Keywords

optimization stability convergence saddle-point problem reinforcement learning (RL)temporal-difference (TD) learning

Mathematics Subject Classification ID

Systems theory; control (93-XX)

Recommendations

This page was built for publication: New Versions of Gradient Temporal-Difference Learning

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6093230)