scientific article
From MaRDI portal
Publication:2953645
zbMath1404.68124arXiv1511.07471MaRDI QIDQ2953645
Publication date: 5 January 2017
Full work available at URL: https://arxiv.org/abs/1511.07471
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
convergenceMarkov decision processesimportance samplingstochastic approximationreinforcement learningapproximate policy evaluationtemporal-difference methods
Related Items
Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning, Gradient temporal-difference learning for off-policy evaluation using emphatic weightings, Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning, Distributed consensus-based multi-agent temporal-difference learning, On Generalized Bellman Equations and Temporal-Difference Learning, Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning, Convergence of Recursive Stochastic Algorithms Using Wasserstein Divergence