scientific article
From MaRDI portal
Publication:2810885
zbMath1360.68712arXiv1503.04269MaRDI QIDQ2810885
Martha White, Richard S. Sutton, Ashique Rupam Mahmood
Publication date: 6 June 2016
Full work available at URL: https://arxiv.org/abs/1503.04269
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Related Items
Gradient temporal-difference learning for off-policy evaluation using emphatic weightings, Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning, Distributed consensus-based multi-agent temporal-difference learning, Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning, On Generalized Bellman Equations and Temporal-Difference Learning, Off-policy temporal difference learning with distribution adaptation in fast mixing chains, Statistical Inference for Online Decision Making via Stochastic Gradient Descent, Multi-agent reinforcement learning: a selective overview of theories and algorithms