Temporal-difference search in Computer Go
From MaRDI portal
Publication:420936
DOI10.1007/s10994-012-5280-0zbMath1238.91044OpenAlexW2153039919MaRDI QIDQ420936
Publication date: 23 May 2012
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10994-012-5280-0
reinforcement learningComputer GoMonte Carlo searchsimulation based searchtemporal-difference learning
Monte Carlo methods (65C05) Markov and semi-Markov decision processes (90C40) Computational methods for problems pertaining to game theory, economics, and finance (91-08) Combinatorial games (91A46)
Related Items (2)
Simulation-based search ⋮ Default policies for global optimisation of noisy functions with severe noise
Cites Work
- Unnamed Item
- Unnamed Item
- Analytical mean squared error curves for temporal difference learning
- Convergence results for single-step on-policy reinforcement-learning algorithms
- Learning to play chess using temporal differences
- 10.1162/153244303768966102
- Amazons Discover Monte-Carlo
- An Analysis of UCT in Multi-player Games
- Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength
- Computer Go
- Finite-time analysis of the multiarmed bandit problem
This page was built for publication: Temporal-difference search in Computer Go