Markov decision processes (Q5894023)
From MaRDI portal
scientific article; zbMATH DE number 5834332
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Markov decision processes |
scientific article; zbMATH DE number 5834332 |
Statements
Markov decision processes (English)
0 references
10 January 2011
0 references
The authors consider the full observable Markov decision process with finite and infinite time horizon in this paper. For the finite time horizon case, the solution of the Markov decision problem can be obtained via solving a Bellman equation under the integrability and structure assumptions. Sufficient conditions for the two assumptions are discussed and the two applications of the card game and stochastic linear quadratic control problems are presented to illustrate their proposed solution method. For the infinite time horizon case, the authors show that both its reward value and optimal policy can be approximated by a sequence of reward values and optimal policies of those cases with finite time horizon under certain conditions. The bandit and dividend pay-out problems are presented to illustrate the proposed theory in this section.
0 references
Markov decision process
0 references
Markov chain
0 references
Bellman equation
0 references
policy improvement
0 references
linear programming
0 references