Temporal difference-based policy iteration for optimal control of stochastic systems
DOI10.1007/s10957-013-0418-1zbMath1306.93074OpenAlexW2080453320MaRDI QIDQ467477
Xiao-Mei Liu, Kang Cheng, Kanjian Zhang, Haikun Wei, Shu-Min Fei
Publication date: 3 November 2014
Published in: Journal of Optimization Theory and Applications (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10957-013-0418-1
stochastic optimal controlapproximate dynamic programminglearning algorithms: discrete-time systemsleast squares policy evaluation algorithm
Dynamic programming in optimal control and differential games (49L20) Discrete-time control/observation systems (93C55) Dynamic programming (90C39) Optimal stochastic control (93E20) Stochastic systems in control theory (general) (93E03) Existence of optimal solutions to problems involving randomness (49J55)
Related Items (2)
Cites Work
- Unnamed Item
- Unnamed Item
- Stochastic control via direct comparison
- Markov chains and stochastic stability
- A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases
- Projected equation methods for approximate solution of large linear systems
- Single sample path-based optimization of Markov chains
- Least squares policy evaluation algorithms with linear function approximation
- Linear least-squares algorithms for temporal difference learning
- A unified approach to Markov decision problems and performance sensitivity analysis
- Policy iteration based feedback control
- Approximate policy iteration: a survey and some new methods
- A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications
- On the use of the deterministic Lyapunov function for the ergodicity of stochastic difference equations
- An analysis of temporal-difference learning with function approximation
- Perturbation realization, potentials, and sensitivity analysis of Markov processes
- 10.1162/1532443041827907
- Convergence Results for Some Temporal Difference Methods Based on Least Squares
- Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes
- Approximate Dynamic Programming
This page was built for publication: Temporal difference-based policy iteration for optimal control of stochastic systems