On Upper-Confidence Bound Policies for Switching Bandit Problems
From MaRDI portal
Publication:3093948
DOI10.1007/978-3-642-24412-4_16zbMath1349.60070OpenAlexW157259654MaRDI QIDQ3093948
Eric Moulines, Aurélien Garivier
Publication date: 19 October 2011
Published in: Lecture Notes in Computer Science (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/978-3-642-24412-4_16
Learning and adaptive systems in artificial intelligence (68T05) Stopping times; optimal stopping problems; gambling theory (60G40)
Related Items (11)
Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers ⋮ Tracking the market: dynamic pricing and learning in a changing environment ⋮ Improving multi-armed bandit algorithms in online pricing settings ⋮ Lipschitzness is all you need to tame off-policy generative adversarial imitation learning ⋮ Unnamed Item ⋮ Optimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards ⋮ Learning the distribution with largest mean: two bandit frameworks ⋮ Finite-Time Analysis for the Knowledge-Gradient Policy ⋮ Unnamed Item ⋮ Context tree selection: a unifying view ⋮ Order scoring, bandit learning and order cancellations
This page was built for publication: On Upper-Confidence Bound Policies for Switching Bandit Problems