On-policy concurrent reinforcement learning
From MaRDI portal
Publication:4670596
DOI10.1080/09528130412331297956zbMath1066.68106OpenAlexW1972847450WikidataQ113437934 ScholiaQ113437934MaRDI QIDQ4670596
Jing Peng, Bikramjit Banerjee, Sandip Sen
Publication date: 22 April 2005
Published in: Journal of Experimental & Theoretical Artificial Intelligence (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1080/09528130412331297956
Cites Work
- Fast online \(Q(\lambda)\)
- Convergence results for single-step on-policy reinforcement-learning algorithms
- Multiagent learning using a variable learning rate
- \({\mathcal Q}\)-learning
- Two-person nonzero-sum games and quadratic programming
- Non-cooperative games
- An analysis of temporal-difference learning with function approximation
This page was built for publication: On-policy concurrent reinforcement learning