Modification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree search
From MaRDI portal
Publication:307787
DOI10.1016/J.TCS.2016.06.034zbMath1370.68263OpenAlexW2467873743MaRDI QIDQ307787
Yun-Ching Liu, Yoshimasa Tsuruoka
Publication date: 5 September 2016
Published in: Theoretical Computer Science (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.tcs.2016.06.034
Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20) Combinatorial games (91A46) Probabilistic games; gambling (91A60)
Cites Work
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- Asymptotically efficient adaptive allocation rules
- Simple Regret Optimization in Online Planning for Markov Decision Processes
- Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis
- Pure Exploration in Multi-armed Bandits Problems
- Finite-time analysis of the multiarmed bandit problem
This page was built for publication: Modification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree search