ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS
From MaRDI portal
Publication:5358114
DOI10.1017/S0269964816000279zbMath1414.91105arXiv1607.05970OpenAlexW3104196082MaRDI QIDQ5358114
No author found.
Publication date: 19 September 2017
Published in: Probability in the Engineering and Informational Sciences (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1607.05970
Related Items (1)
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Stochastic orders
- On the Gittins index for multiarmed bandits
- Efficient global optimization of expensive black-box functions
- Optimal learning and experimentation in bandit problems.
- Bayesian look ahead one-stage sampling allocations for selection of the best population
- Optimal learning with non-Gaussian rewards
- The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
- The Knowledge-Gradient Policy for Correlated Normal Beliefs
- Multi‐Armed Bandit Allocation Indices
- A Knowledge-Gradient Policy for Sequential Information Collection
- Learning to Optimize via Posterior Sampling
This page was built for publication: ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS