Some monotonicity properties of parametric and nonparametric Bayesian bandits (Q2405170)

The paper is concentrated on various properties of sequential decision procedures in the Bayesian framework for parametric and nonparametric two-armed bandit problems. One of two independent stochastic processes (arms) is to be selected sequentially at each stage of \(n\) stages and the selection decision depends on the past observations and the prior information. The objective is to maximize the expected future-discounted sum of the \(n\) observations. The author studies the structural properties of the classical bandit problem in the Bayesian framework, for example, how the maximum expected payoff and the optimal strategy vary with the priors, in two cases: (a) observations from each arm have an exponential family distribution, and different arms are assigned conjugate priors; (b) observations from each arm have a nonparametric distribution, and different arms are assigned independent Dirichlet process priors. The following results are noted: (i) for a particular arm with fixed prior weight, the maximum expected payoff increases as the prior mean yield increases; (ii) for a fixed prior mean yield, the maximum expected payoff increases as the prior weight decreases. Some specializations and the resulting properties are noted. These results generalize the works of \textit{J. Gittins} and \textit{Y.-G. Wang} [Ann. Stat. 20, No. 3, 1625--1636 (1992; Zbl 0760.62080)] and \textit{M. K. Clayton} and \textit{D. A. Berry} [ibid. 13, 1523--1534 (1985; Zbl 0587.62151)].

0 references

reviewed by

Rasul A. Khan

0 references

zbMATH Keywords

Bernoulli bandits

0 references

convex order

0 references

optimal stopping

0 references

sequential decision