Strong 1-optimal stationary policies in denumerable Markov decision processes (Q1108940)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Strong 1-optimal stationary policies in denumerable Markov decision processes |
scientific article; zbMATH DE number 4068651
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Strong 1-optimal stationary policies in denumerable Markov decision processes |
scientific article; zbMATH DE number 4068651 |
Statements
Strong 1-optimal stationary policies in denumerable Markov decision processes (English)
0 references
1988
0 references
Consider a Markov decision process with countable state space S, compact action sets and bounded rewards. Let \(V_{\alpha}(\pi,i)\) denote the expected \(\alpha\)-discounted reward under policy \(\pi\), starting in state i. \(\pi\) * is called a strong 1-optimal policy (SOP) if, for each \(i\in S\), \(\lim_{\alpha \to 1}[V_{\alpha}(\pi\) *,i)- \(\sup_{\pi}V_{\alpha}(\pi,i)]=O\). Under a standard set of assumptions (including the simultaneous Doeblin condition) for the existence of a stationary average optimal policy, the author proves that (i) a stationary SOP exists, (ii) any limit point, as \(\alpha\) \(\to 1\), of stationary \(\alpha\)-discounted optimal policies is a (stationary) SOP.
0 references
Markov decision process
0 references
countable state space
0 references
compact action sets
0 references
bounded rewards
0 references
\(\alpha\)-discounted reward
0 references
strong 1-optimal policy
0 references
simultaneous Doeblin condition
0 references
stationary average optimal policy
0 references
0 references
0.91027117
0 references
0.9031787
0 references
0.90091145
0 references
0.8979459
0 references