Two classes Markov decision processes with perturbations (Q2739190)

From MaRDI portal





scientific article; zbMATH DE number 1643600
Language Label Description Also known as
English
Two classes Markov decision processes with perturbations
scientific article; zbMATH DE number 1643600

    Statements

    0 references
    0 references
    4 March 2004
    0 references
    \(\delta\)-optimal policy
    0 references
    optimal stochastic policy
    0 references
    Markov decision processes
    0 references
    perturbations
    0 references
    Two classes Markov decision processes with perturbations (English)
    0 references
    Two classes of Markov decision processes (MDP) (with denumerable state space \(S\) and denumerable action set \(A)\) with perturbations are discussed. Under the definition of a \(\delta\)-optimal policy, a perturbation model \(P_\varepsilon(D)\) for the discrete-time non-stationary MDP with respect to a maximization criterion of limiting average expected reward, and a perturbation model \(C_\varepsilon(D)\) for the continuous-time stationary MDP with respect to the criteria of maximization of the discounted expected reward are proposed where the transition probabilities \((p_{n+1} (j|i,a) (\varepsilon)\) and \(q(j|i,a) (\varepsilon)\), \(i,j\in S\), \(a\in A)\) in the two cases are taken as perturbed according to a so-called disturbance set \(D\) [cf. \textit{M. Abbad} and \textit{J. A. Filar}, IEEE Trans. Autom. Control 37, 1415-1420 (1992; Zbl 0763.90091)]. It is then proved that if \(\pi\) is an optimal stochastic (Markov or stationary) policy before perturbation, then for any \(\delta>0\) there exists an \(\varepsilon >0\) such that \(\pi\) is \(\delta\)-optimal in the perturbation model \((P_\varepsilon (D)\) or \(C_\varepsilon(D)\), respectively).
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references