Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning
From MaRDI portal
Publication:2094051
DOI10.1007/978-3-030-60990-0_19OpenAlexW3175189009MaRDI QIDQ2094051
Isaac J. Sledge, Jose C. Principe
Publication date: 28 October 2022
Full work available at URL: https://doi.org/10.1007/978-3-030-60990-0_19
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
- Asymptotically efficient adaptive allocation rules
- Convergence results for single-step on-policy reinforcement-learning algorithms
- Pure exploration in finitely-armed and continuous-armed bandits
- Convex Optimization: Algorithms and Complexity
- Learning Theory
- Markovian Decision Processes with Uncertain Transition Probabilities
- The Nonstochastic Multiarmed Bandit Problem
- 10.1162/153244303321897663
- Asymmetry of Risk and Value of Information
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
- Finite-time analysis of the multiarmed bandit problem
This page was built for publication: Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning