The two-armed bandit with delayed responses (Q1098527)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: The two-armed bandit with delayed responses |
scientific article; zbMATH DE number 4039059
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | The two-armed bandit with delayed responses |
scientific article; zbMATH DE number 4039059 |
Statements
The two-armed bandit with delayed responses (English)
0 references
1988
0 references
A general model for a two-armed bandit with delayed responses is introduced and solved with dynamic programming. One arm has geometric lifetime with parameter \(\theta\), which has prior distribution \(\mu\). The other arm has known lifetime with mean \(\kappa\). The response delays completely change the character of the optimal strategies from the no delay case; in particular, the bandit is no longer a stopping problem. The delays also introduce an extra parameter p into the state space. In clinical trial applications, this parameter represents the number of patients previously treated with the unknown arm who are still living. The value function is introduced and investigated as p, \(\mu\) and \(\kappa\) vary. Under a regularity condition on the discount sequence, there exists a manifold in the state space such that both arms are optimal on the manifold, arm x is optimal on one side and arm y on the other. Properties of the manifold are described.
0 references
randomized clinical trials
0 references
two-armed bandit with delayed responses
0 references
geometric lifetime
0 references
optimal strategies
0 references
value function
0 references
0.92426836
0 references
0.88266593
0 references
0.8693529
0 references
0 references
0.84154683
0 references
0.8413509
0 references
0.84124136
0 references