Temporal difference-based policy iteration for optimal control of stochastic systems (Q467477)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Temporal difference-based policy iteration for optimal control of stochastic systems |
scientific article; zbMATH DE number 6363606
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Temporal difference-based policy iteration for optimal control of stochastic systems |
scientific article; zbMATH DE number 6363606 |
Statements
Temporal difference-based policy iteration for optimal control of stochastic systems (English)
0 references
3 November 2014
0 references
The authors consider an infinite-horizon stochastic optimal control problem for a discrete-time dynamic system with an additive random noise and a discounted average cost performance index. To find an optimal feedback control law the so-called approximate dynamic programming (see [\textit{W. B. Powell}, Approximate dynamic programming. Solving the curses of dimensionality. Hoboken, NJ: John Wiley \& Sons (2007; Zbl 1156.90021)]) is used in which the cost-to-go function is estimated by a temporal difference-based learning algorithm (see [\textit{R. S. Sutton}, ``Learning to predict by the methods of temporal differences'', Mach. Learn. 3, 9--44 (1988)]). The main contribution of the paper is a continuous least squares policy evaluation algorithm which enables the potential based policy iteration in a continuous state space. The algorithm is derived by solving the fixed-point equation based on the discounted Poisson equation. A continuous least squares temporal difference algorithm is also derived and a class of basis functions in the form of Euclidean distance functions to simplify the computations is proposed. The proposed methodology is illustrated by simulation examples.
0 references
stochastic optimal control
0 references
least squares policy evaluation algorithm
0 references
approximate dynamic programming
0 references
learning algorithms: discrete-time systems
0 references
0 references
0.8449094
0 references
0.8331711
0 references
0.8287745
0 references
0 references
0.82096386
0 references
0.81846046
0 references
0.81461704
0 references
0.81256145
0 references