Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization
From MaRDI portal
Publication:6566614
DOI10.1016/J.ARTINT.2024.104096MaRDI QIDQ6566614
Jonathan P. How, Stewart Jamieson, Yogesh Girdhar
Publication date: 3 July 2024
Published in: Artificial Intelligence (Search for Journal in Brave)
Cites Work
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Planning and acting in partially observable stochastic domains
- Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges
- Asymptotic methods in statistical decision theory
- Efficient global optimization of expensive black-box functions
- Congresso internazionale dei matematici. Argomenti delle communicazioni. 3-10 settembre 1928.
- Bayesian look ahead one-stage sampling allocations for selection of the best population
- On the likelihood that one unkrown probability exeeds another in view of the evidence of two samples.
- Bayesian reinforcement learning: a survey
- Scalable and efficient Bayes-adaptive reinforcement learning based on Monte-Carlo tree search
- The knowledge gradient algorithm for a general class of online learning problems
- Near-optimal regret bounds for reinforcement learning
- Regret bounds and minimax policies under partial monitoring
- The knowledge-gradient policy for correlated normal beliefs
- 10.1162/153244303765208377
- A Knowledge-Gradient Policy for Sequential Information Collection
- Pure Exploration in Multi-armed Bandits Problems
- An analysis of approximations for maximizing submodular set functions—I
- Discrete Convex Analysis
- Finite-Time Analysis for the Knowledge-Gradient Policy
- Learning to Optimize via Information-Directed Sampling
- Technical Note—A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents
- Bandit Algorithms
- Simple Bayesian Algorithms for Best-Arm Identification
- Partial Monitoring—Classification, Regret Bounds, and Algorithms
- Learning to Optimize via Posterior Sampling
- Risk-Sensitive Reinforcement Learning
- On Stochastic Limit and Order Relationships
- Finite-time analysis of the multiarmed bandit problem
- Reinforcement Learning, Bit by Bit
This page was built for publication: Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6566614)