Two classes Markov decision processes with perturbations (Q2739190)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Two classes Markov decision processes with perturbations |
scientific article; zbMATH DE number 1643600
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Two classes Markov decision processes with perturbations |
scientific article; zbMATH DE number 1643600 |
Statements
4 March 2004
0 references
\(\delta\)-optimal policy
0 references
optimal stochastic policy
0 references
Markov decision processes
0 references
perturbations
0 references
0.8240287
0 references
0.8137373
0 references
0.7733067
0 references
0.76608217
0 references
0.76249015
0 references
0.75928205
0 references
0.74654406
0 references
Two classes Markov decision processes with perturbations (English)
0 references
Two classes of Markov decision processes (MDP) (with denumerable state space \(S\) and denumerable action set \(A)\) with perturbations are discussed. Under the definition of a \(\delta\)-optimal policy, a perturbation model \(P_\varepsilon(D)\) for the discrete-time non-stationary MDP with respect to a maximization criterion of limiting average expected reward, and a perturbation model \(C_\varepsilon(D)\) for the continuous-time stationary MDP with respect to the criteria of maximization of the discounted expected reward are proposed where the transition probabilities \((p_{n+1} (j|i,a) (\varepsilon)\) and \(q(j|i,a) (\varepsilon)\), \(i,j\in S\), \(a\in A)\) in the two cases are taken as perturbed according to a so-called disturbance set \(D\) [cf. \textit{M. Abbad} and \textit{J. A. Filar}, IEEE Trans. Autom. Control 37, 1415-1420 (1992; Zbl 0763.90091)]. It is then proved that if \(\pi\) is an optimal stochastic (Markov or stationary) policy before perturbation, then for any \(\delta>0\) there exists an \(\varepsilon >0\) such that \(\pi\) is \(\delta\)-optimal in the perturbation model \((P_\varepsilon (D)\) or \(C_\varepsilon(D)\), respectively).
0 references