Vector-valued Markov decision processes with average reward criterion: the multichain case (Q2711568)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Vector-valued Markov decision processes with average reward criterion: the multichain case |
scientific article
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Vector-valued Markov decision processes with average reward criterion: the multichain case |
scientific article |
Statements
2000
0 references
finite state-space
0 references
finite action-space
0 references
Pareto optimal
0 references
policy iteration
0 references
Vector-valued Markov decision processes with average reward criterion: the multichain case (English)
0 references
Conditions for average reward Pareto optimality over all non-anticipating policies are sought for general finite state- and action-space vector-valued Markov decision processes. Sufficient, necessary and both necessary and sufficient conditions are obtained for deterministic stationary policy to be Pareto optimal; these may be phrased in terms of the existence of solutions to certain systems of linear inequalities. A policy iteration algorithm for determining all such optimal policies is described and illustrated.
0 references
0.8673420548439026
0 references
0.8521600365638733
0 references