Reward revision and the average reward Markov decision process (Q1097179)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Reward revision and the average reward Markov decision process |
scientific article; zbMATH DE number 4033533
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Reward revision and the average reward Markov decision process |
scientific article; zbMATH DE number 4033533 |
Statements
Reward revision and the average reward Markov decision process (English)
0 references
1987
0 references
We integrate two numerical procedures for solving the average reward Markov decision process (MDP), standard successive approximations and modified policy iteration with reward revision. Reward revision is the process of revising the reward structure of a second, more computationally desirable MDP so as to produce, in the limit, an optimality equation having a fixed point identical to that associated with the original MDP. A numerical study indicates that for MDP's having a non-sparse structure with a small number of relatively large entries per row, the addition of reward revision can have significant computational benefits.
0 references
average reward Markov decision process
0 references
successive approximations
0 references
modified policy iteration
0 references
reward revision
0 references
0 references
0 references