Average cost temporal-difference learning
From MaRDI portal
Publication:1805802
DOI10.1016/S0005-1098(99)00099-0zbMath0932.93085MaRDI QIDQ1805802
John N. Tsitsiklis, Benjamin van Roy
Publication date: 28 February 2000
Published in: Automatica (Search for Journal in Brave)
Dynamic programming in optimal control and differential games (49L20) Dynamic programming (90C39) Optimal stochastic control (93E20) Stochastic learning and adaptive control (93E35)
Related Items (19)
A time aggregation approach to Markov decision processes ⋮ Approximate policy iteration: a survey and some new methods ⋮ Multiscale Q-learning with linear function approximation ⋮ Scalable Reinforcement Learning for Multiagent Networked Systems ⋮ Reinforcement learning based algorithms for average cost Markov decision processes ⋮ A stability criterion for two timescale stochastic approximation schemes ⋮ Risk-Sensitive Reinforcement Learning via Policy Gradient Search ⋮ An online actor-critic algorithm with function approximation for constrained Markov decision processes ⋮ Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets ⋮ Adaptive data-aware utility-based scheduling in resource-constrained systems ⋮ Hyperbolically Discounted Temporal Difference Learning ⋮ Long-Term Reward Prediction in TD Models of the Dopamine System ⋮ Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning ⋮ Projected equation methods for approximate solution of large linear systems ⋮ Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation ⋮ Natural actor-critic algorithms ⋮ Fundamental design principles for reinforcement learning algorithms ⋮ Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning ⋮ Actor-Critic Algorithms with Online Feature Adaptation
This page was built for publication: Average cost temporal-difference learning