Gradient temporal-difference learning for off-policy evaluation using emphatic weightings
From MaRDI portal
Publication:6146179
DOI10.1016/j.ins.2021.08.082OpenAlexW3196183762MaRDI QIDQ6146179
Fei Zhu, Shan Zhong, Quan Liu, Qiming Fu, Jiaqing Cao
Publication date: 10 January 2024
Published in: Information Sciences (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.ins.2021.08.082
reinforcement learningtemporal-difference learningoff-policy evaluationemphatic approachgradient temporal-difference learning
Cites Work
- \({\mathcal Q}\)-learning
- Recruitment-imitation mechanism for evolutionary reinforcement learning
- An analysis of temporal-difference learning with function approximation
- Marginal Mean Models for Dynamic Regimes
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item