Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization

Cites Work

Title not available (Why is that?)
Title not available (Why is that?)
Title not available (Why is that?)
Title not available (Why is that?)
Title not available (Why is that?)
Title not available (Why is that?)
Planning and acting in partially observable stochastic domains
Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges
Asymptotic methods in statistical decision theory
Efficient global optimization of expensive black-box functions
Congresso internazionale dei matematici. Argomenti delle communicazioni. 3-10 settembre 1928.
Bayesian look ahead one-stage sampling allocations for selection of the best population
On the likelihood that one unkrown probability exeeds another in view of the evidence of two samples.
Bayesian reinforcement learning: a survey
Scalable and efficient Bayes-adaptive reinforcement learning based on Monte-Carlo tree search
The knowledge gradient algorithm for a general class of online learning problems
Near-optimal regret bounds for reinforcement learning
Regret bounds and minimax policies under partial monitoring
The knowledge-gradient policy for correlated normal beliefs
10.1162/153244303765208377
A Knowledge-Gradient Policy for Sequential Information Collection
Pure Exploration in Multi-armed Bandits Problems
An analysis of approximations for maximizing submodular set functions—I
Discrete Convex Analysis
Finite-Time Analysis for the Knowledge-Gradient Policy
Learning to Optimize via Information-Directed Sampling
Technical Note—A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents
Bandit Algorithms
Simple Bayesian Algorithms for Best-Arm Identification
Partial Monitoring—Classification, Regret Bounds, and Algorithms
Learning to Optimize via Posterior Sampling
Risk-Sensitive Reinforcement Learning
On Stochastic Limit and Order Relationships
Finite-time analysis of the multiarmed bandit problem
Reinforcement Learning, Bit by Bit

This page was built for publication: Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6566614)