Markov decision processes (Q5894023)

The authors consider the full observable Markov decision process with finite and infinite time horizon in this paper. For the finite time horizon case, the solution of the Markov decision problem can be obtained via solving a Bellman equation under the integrability and structure assumptions. Sufficient conditions for the two assumptions are discussed and the two applications of the card game and stochastic linear quadratic control problems are presented to illustrate their proposed solution method. For the infinite time horizon case, the authors show that both its reward value and optimal policy can be approximated by a sequence of reward values and optimal policies of those cases with finite time horizon under certain conditions. The bandit and dividend pay-out problems are presented to illustrate the proposed theory in this section.

0 references

reviewed by

Changzhi Wu

0 references

zbMATH Keywords

Markov decision process

0 references

Markov chain

0 references

Bellman equation

0 references

policy improvement