Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
On the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in discounted unbounded return model - MaRDI portal

On the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in discounted unbounded return model (Q581259)

From MaRDI portal





scientific article; zbMATH DE number 4018805
Language Label Description Also known as
English
On the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in discounted unbounded return model
scientific article; zbMATH DE number 4018805

    Statements

    On the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in discounted unbounded return model (English)
    0 references
    0 references
    0 references
    1987
    0 references
    This paper investigates the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in the model of \textit{Guo Shizhen} [Math. Economics 1, 109-120 (1984) (Chinese)]. It is shown that, if \(\pi^*=(\pi_ 0^*\), \(\pi_ 1^*\), \(\cdot \cdot \cdot\), \(\pi^*_ n\), \(\pi^*_{n+1}\), \(\cdot \cdot \cdot)\) is a \(\beta\)-discounted optimal policy, then \((\pi^*_ 0\), \(\pi^*_ 1\), \(\cdot \cdot \cdot\), \(\pi^*_ n)^{\infty}\) for all \(n\geq 0\) is also a \(\beta\)-discounted optimal policy. Under some conditions we prove that a stochastic stationary policy \(\pi_ n^{*\infty}\) corresponding to the decision rule \(\pi^*_ n\) is also optimal for the same discounting factor \(\beta\). We have also shown that each \(\beta\)-optimal stochastic stationary policy \(\pi_ 0^{*\infty}\), \(\pi_ 0^{*\infty}\) can be decomposed into several decision rules to which the corresponding stationary policies are also \(\beta\)-optimal separately; and conversely, a proper convex combination of these decision rules is identified with the former \(\pi^*_ 0\). We have further proved that for any (\(\epsilon\),\(\beta)\)-optimal policy, say \(\pi^*=(\pi^*_ 0,\pi^*_ 1,...\), \(\pi^*_ n,\pi^*_{n+1}\), \(\cdot \cdot \cdot)\), \((\pi^*_ 0\), \(\pi^*_ 1\), \(\cdot \cdot \cdot\), \(\pi^*_{n-1})^{\infty}\) is \(((1-\beta^ n)^{- 1}\epsilon,\beta)\) optimal for \(n>0\). At the end of this paper we mention that the results about convex combinations and decompositions of optimal policies given by \textit{Luo Handong}, \textit{Liu Jiwei} and \textit{Xia Zhihao} [J. Huazhong (Central China) Univ. of Sci. and Technol. 14, No.4 (1986)] can be extended to our case.
    0 references
    \(\epsilon \)-optimal policy
    0 references
    \(\beta \)-discounted optimal policy
    0 references
    stochastic stationary policy
    0 references
    convex combinations
    0 references
    decompositions
    0 references

    Identifiers