Batch mode reinforcement learning based on the synthesis of artificial trajectories
From MaRDI portal
Publication:378762
DOI10.1007/s10479-012-1248-5zbMath1276.68134OpenAlexW2134689794WikidataQ42258641 ScholiaQ42258641MaRDI QIDQ378762
Damien Ernst, Louis Wehenkel, Raphael Fonteneau, Susan A. Murphy
Publication date: 12 November 2013
Published in: Annals of Operations Research (Search for Journal in Brave)
Full work available at URL: http://europepmc.org/articles/pmc3773886
Learning and adaptive systems in artificial intelligence (68T05) Stochastic learning and adaptive control (93E35)
Related Items
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Kernel-based reinforcement learning
- Technical update: Least-squares temporal difference learning
- Least squares policy evaluation algorithms with linear function approximation
- Towards Min Max Generalization in Reinforcement Learning
- Marginal Mean Models for Dynamic Regimes
- Optimal Dynamic Treatment Regimes
- A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect
- 10.1162/1532443041827907
- Machine Learning: ECML 2003