Q( $$\lambda $$ ) with Off-Policy Corrections
From MaRDI portal
Publication:2831390
DOI10.1007/978-3-319-46379-7_21zbMath1466.68067arXiv1602.04951OpenAlexW2962766894MaRDI QIDQ2831390
Rémi Munos, Anna Harutyunyan, Marc G. Bellemare, Tom Stepleton
Publication date: 9 November 2016
Published in: Lecture Notes in Computer Science (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1602.04951
Related Items (9)
Unnamed Item ⋮ Optimistic reinforcement learning by forward Kullback-Leibler divergence optimization ⋮ Classification with costly features as a sequential decision-making problem ⋮ Deep Reinforcement Learning: A State-of-the-Art Walkthrough ⋮ On Generalized Bellman Equations and Temporal-Difference Learning ⋮ Q( $$\lambda $$ ) with Off-Policy Corrections ⋮ TD-regularized actor-critic methods ⋮ Unnamed Item ⋮ Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients
Cites Work
This page was built for publication: Q( $$\lambda $$ ) with Off-Policy Corrections