When can the two-armed bandit algorithm be trusted?
From MaRDI portal
Publication:1879915
DOI10.1214/105051604000000350zbMath1048.62079arXivmath/0407128OpenAlexW4297424836MaRDI QIDQ1879915
Damien Lamberton, Pierre Tarrès, Gilles Pagès
Publication date: 15 September 2004
Published in: The Annals of Applied Probability (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/math/0407128
Adaptive control/observation systems (93C40) Strong limit theorems (60F15) Stochastic approximation (62L20) Stopping times; optimal stopping problems; gambling theory (60G40) Sequential statistical design (62L05)
Related Items (12)
On the robustness of learning in games with stochastically perturbed payoff observations ⋮ Stochastic approximation of quasi-stationary distributions on compact spaces and applications ⋮ Penalty-Regulated Dynamics and Robust Learning Procedures in Games ⋮ Randomized urn models revisited using stochastic approximation ⋮ Regret bounds for Narendra-Shapiro bandit algorithms ⋮ Robustness of stochastic bandit policies ⋮ Analysis of the smoothly amnesia-reinforced multidimensional elephant random walk ⋮ How Fast Is the Bandit? ⋮ On ergodic two-armed bandits ⋮ Convergence in models with bounded expected relative hazard rates ⋮ Nonlinear randomized urn models: a stochastic approximation viewpoint ⋮ Unnamed Item
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Nonconvergence to unstable points in urn models and stochastic approximations
- Stochastic approximation methods for constrained and unconstrained systems
- Stochastic algorithms
- Asymptotic pseudotrajectories and chain recurrent flows, with applications
- Do stochastic algorithms avoid traps?
- On the linear model with two absorbing barriers
- Learning Automata - A Survey
- Decreasing step Stochastic algorithms: a.s. behaviour of weighted empirical measures
- Pièges répulsifs
- Use of Stochastic Automata for Parameter Self-Optimization with Multimodal Performance Criteria
This page was built for publication: When can the two-armed bandit algorithm be trusted?