Some monotonicity properties of parametric and nonparametric Bayesian bandits (Q2405170)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Some monotonicity properties of parametric and nonparametric Bayesian bandits
scientific article

    Statements

    Some monotonicity properties of parametric and nonparametric Bayesian bandits (English)
    0 references
    0 references
    0 references
    21 September 2017
    0 references
    The paper is concentrated on various properties of sequential decision procedures in the Bayesian framework for parametric and nonparametric two-armed bandit problems. One of two independent stochastic processes (arms) is to be selected sequentially at each stage of \(n\) stages and the selection decision depends on the past observations and the prior information. The objective is to maximize the expected future-discounted sum of the \(n\) observations. The author studies the structural properties of the classical bandit problem in the Bayesian framework, for example, how the maximum expected payoff and the optimal strategy vary with the priors, in two cases: (a) observations from each arm have an exponential family distribution, and different arms are assigned conjugate priors; (b) observations from each arm have a nonparametric distribution, and different arms are assigned independent Dirichlet process priors. The following results are noted: (i) for a particular arm with fixed prior weight, the maximum expected payoff increases as the prior mean yield increases; (ii) for a fixed prior mean yield, the maximum expected payoff increases as the prior weight decreases. Some specializations and the resulting properties are noted. These results generalize the works of \textit{J. Gittins} and \textit{Y.-G. Wang} [Ann. Stat. 20, No. 3, 1625--1636 (1992; Zbl 0760.62080)] and \textit{M. K. Clayton} and \textit{D. A. Berry} [ibid. 13, 1523--1534 (1985; Zbl 0587.62151)].
    0 references
    0 references
    Bernoulli bandits
    0 references
    convex order
    0 references
    optimal stopping
    0 references
    sequential decision
    0 references
    two-armed bandits
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references