Regret bounds for Narendra-Shapiro bandit algorithms
From MaRDI portal
Publication:5086451
DOI10.1080/17442508.2018.1457675zbMath1498.60303arXiv1502.04874OpenAlexW1693596837MaRDI QIDQ5086451
Sébastien Gadat, Fabien Panloup, Sofiane Saadane
Publication date: 5 July 2022
Published in: Stochastics (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1502.04874
Continuous-time Markov processes on general state spaces (60J25) Markov chains (discrete-time Markov processes on discrete state spaces) (60J10) Sequential statistical analysis (62L10)
Cites Work
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Total variation estimates for the TCP process
- A penalized bandit algorithm
- Stochastic approximation methods for constrained and unconstrained systems
- When can the two-armed bandit algorithm be trusted?
- On the linear model with two absorbing barriers
- Long time behavior of Markov processes and beyond
- How Fast Is the Bandit?
- The Nonstochastic Multiarmed Bandit Problem
- 10.1162/153244303321897663
- Use of Stochastic Automata for Parameter Self-Optimization with Multimodal Performance Criteria
- Some aspects of the sequential design of experiments
- Unnamed Item
- Unnamed Item
- Unnamed Item
This page was built for publication: Regret bounds for Narendra-Shapiro bandit algorithms