New Versions of Gradient Temporal-Difference Learning
From MaRDI portal
Publication:6093230
DOI10.1109/TAC.2022.3213763arXiv2109.04033MaRDI QIDQ6093230
Author name not available (Why is that?)
Publication date: 6 October 2023
Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)
Abstract: Sutton, Szepesv'{a}ri and Maei introduced the first gradient temporal-difference (GTD) learning algorithms compatible with both linear function approximation and off-policy training. The goal of this paper is (a) to propose some variants of GTDs with extensive comparative analysis and (b) to establish new theoretical analysis frameworks for the GTDs. These variants are based on convex-concave saddle-point interpretations of GTDs, which effectively unify all the GTDs into a single framework, and provide simple stability analysis based on recent results on primal-dual gradient dynamics. Finally, numerical comparative analysis is given to evaluate these approaches.
Full work available at URL: https://arxiv.org/abs/2109.04033
optimizationstabilityconvergencesaddle-point problemreinforcement learning (RL)temporal-difference (TD) learning
Recommendations
- Technical update: Least-squares temporal difference learning π π
- Practical issues in temporal difference learning π π
- Hyperbolically Discounted Temporal Difference Learning π π
- Differential Temporal Difference Learning π π
- Title not available (Why is that?) π π
- Title not available (Why is that?) π π
- Title not available (Why is that?) π π
This page was built for publication: New Versions of Gradient Temporal-Difference Learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6093230)