Linear least-squares algorithms for temporal difference learning
From MaRDI portal
Publication:1911340
DOI10.1007/BF00114723zbMath0845.68091MaRDI QIDQ1911340
Steven J. Bradtke, Andrew G. Barto
Publication date: 10 June 1996
Published in: Machine Learning (Search for Journal in Brave)
Learning and adaptive systems in artificial intelligence (68T05) Parallel algorithms in computer science (68W10)
Related Items
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning ⋮ Temporal difference-based policy iteration for optimal control of stochastic systems ⋮ Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning ⋮ Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation ⋮ Regularized feature selection in reinforcement learning ⋮ Variance Regularization in Sequential Bayesian Optimization ⋮ Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation ⋮ A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation ⋮ Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health ⋮ Unnamed Item
Cites Work
- Recursive estimation and time-series analysis. An introduction
- Instrumental variable methods for system identification
- Asynchronous stochastic approximation and Q-learning
- Practical issues in temporal difference learning
- \({\mathcal Q}\)-learning
- The convergence of \(TD(\lambda)\) for general \(\lambda\)
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- A Stochastic Approximation Method
- Unnamed Item
- Unnamed Item
- Unnamed Item