Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Average cost temporal-difference learning - MaRDI portal

Average cost temporal-difference learning

From MaRDI portal

Publication:1805802

Jump to:navigation, search

DOI10.1016/S0005-1098(99)00099-0zbMath0932.93085MaRDI QIDQ1805802

John N. Tsitsiklis, Benjamin van Roy

Publication date: 28 February 2000

Published in: Automatica (Search for Journal in Brave)

zbMATH Keywords

convergence dynamic programming learning mixing time average cost aperiodic Markov chain

Mathematics Subject Classification ID

Dynamic programming in optimal control and differential games (49L20) Dynamic programming (90C39) Optimal stochastic control (93E20) Stochastic learning and adaptive control (93E35)

Related Items (19)

A time aggregation approach to Markov decision processes ⋮ Approximate policy iteration: a survey and some new methods ⋮ Multiscale Q-learning with linear function approximation ⋮ Scalable Reinforcement Learning for Multiagent Networked Systems ⋮ Reinforcement learning based algorithms for average cost Markov decision processes ⋮ A stability criterion for two timescale stochastic approximation schemes ⋮ Risk-Sensitive Reinforcement Learning via Policy Gradient Search ⋮ An online actor-critic algorithm with function approximation for constrained Markov decision processes ⋮ Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets ⋮ Adaptive data-aware utility-based scheduling in resource-constrained systems ⋮ Hyperbolically Discounted Temporal Difference Learning ⋮ Long-Term Reward Prediction in TD Models of the Dopamine System ⋮ Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning ⋮ Projected equation methods for approximate solution of large linear systems ⋮ Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation ⋮ Natural actor-critic algorithms ⋮ Fundamental design principles for reinforcement learning algorithms ⋮ Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning ⋮ Actor-Critic Algorithms with Online Feature Adaptation

This page was built for publication: Average cost temporal-difference learning

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1805802&oldid=14164888"