Learning to Optimize via Information-Directed Sampling
From MaRDI portal
Publication:4969321
DOI10.1287/opre.2017.1663zbMath1458.90497arXiv1403.5556OpenAlexW2765733960MaRDI QIDQ4969321
Daniel J. Russo, Benjamin van Roy
Publication date: 5 October 2020
Published in: Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1403.5556
Related Items (10)
Dynamic Programs with Shared Resources and Signals: Dynamic Fluid Policies and Asymptotic Optimality ⋮ Optimistic Gittins Indices ⋮ Unnamed Item ⋮ Approximating the operating characteristics of Bayesian uncertainty directed trial designs ⋮ Online team formation under different synergies ⋮ Reinforcement Learning, Bit by Bit ⋮ Simple Bayesian Algorithms for Best-Arm Identification ⋮ Exploratory distributions for convex functions ⋮ Unnamed Item ⋮ Unnamed Item
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- An informational approach to the global optimization of expensive-to-evaluate functions
- Asymptotically efficient adaptive allocation rules
- Adaptive treatment allocation and the multi-armed bandit problem
- Bayesian experimental design: A review
- Refined knowledge-gradient policy for learning probabilities
- Bisection Search with Noisy Responses
- The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
- Computing a Classic Index for Finite-Horizon Bandits
- Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space
- Entropy and Information Theory
- Multi‐Armed Bandit Allocation Indices
- Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint
- Dynamic Pricing Under a General Parametric Choice Model
- Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis
- Linearly Parameterized Bandits
- On a Measure of the Information Provided by an Experiment
- A Knowledge-Gradient Policy for Sequential Information Collection
- Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: finite parameter space
- Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains
- Regret in Online Combinatorial Optimization
- Partial Monitoring—Classification, Regret Bounds, and Algorithms
- Learning to Optimize via Posterior Sampling
- Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
- Twenty Questions with Noise: Bayes Optimal Policies for Entropy Loss
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
- Finite-time analysis of the multiarmed bandit problem
This page was built for publication: Learning to Optimize via Information-Directed Sampling