Adaptive treatment allocation and the multi-armed bandit problem (Q1102059)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Adaptive treatment allocation and the multi-armed bandit problem |
scientific article; zbMATH DE number 4048889
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Adaptive treatment allocation and the multi-armed bandit problem |
scientific article; zbMATH DE number 4048889 |
Statements
Adaptive treatment allocation and the multi-armed bandit problem (English)
0 references
1987
0 references
There are k distinct statistical populations each specified by a univariate density function characterized by a parameter of unknown value. The question concerns how \(x_ 1,x_ 2,...,x_ N\) should be sampled sequentially from the k populations in order to maximize (in some sense) the mean value of their sum. A class of simple allocation rules based on upper confidence bounds for the population parameters is proposed. These rules are shown to exhibit asymptotic optimality in both a Bayesian and a frequentist sense. A simulation study provides evidence that the rules perform well even for moderate values of N.
0 references
adaptive treatment allocation
0 references
multi-armed bandit problem
0 references
boundary crossing
0 references
adaptive control
0 references
dynamic allocation
0 references
upper confidence bounds
0 references
asymptotic optimality
0 references
simulation study
0 references