Reinforcement learning for optimal feedback control. A Lyapunov-based approach (Q1650646)

From MaRDI portal





scientific article; zbMATH DE number 6898569
Language Label Description Also known as
English
Reinforcement learning for optimal feedback control. A Lyapunov-based approach
scientific article; zbMATH DE number 6898569

    Statements

    Reinforcement learning for optimal feedback control. A Lyapunov-based approach (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    4 July 2018
    0 references
    The book is devoted to design problems related to development of approximate optimal controllers based on reinforcement learning-based solutions and related decision making problems. All considerations are based on rigorous mathematical treatment including optimal control, stability theory and convergence of design procedures. The authors discuss both model-free and model-based methods. All considerations related to the optimal control and optimal decision making are based on the principle of optimality and dynamic programming techniques. Since these techniques lead to the Hamiltonian-Jacobi-Bellman equations whose solutions could be usually found only numerically, the authors concentrate on approximate dynamic programming algorithms which use a parametric approximation of the optimal policy and/or the optimal value function. One way to implement the approximate iteration algorithms is based on adaptive-critic-based reward system [\textit{R. S. Sutton} and \textit{A. G. Barto}, Reinforcement learning: an introduction. Cambridge: MIT Press (1998); \url{https://mitpress.mit.edu/books/reinforcement-learning}] used for reinforcement learning. The book starts with brief introduction to optimal control followed by review of techniques related to exact and approximate dynamic programming. Their relations with Lyapunov-based stability analysis and differential games are also presented. Then, the authors discuss problem of adaptive online approximate optimal control of uncertain nonlinear systems. A new adaptive -- critic -- identifier architecture is proposed and it is based on a persistence of excitation-based online learning schemes. Some similarities with the Werbos approach [\textit{P. J. Werbos}, Lect. Notes Comput. Sci. Eng. 50, 15--34 (2006; Zbl 1270.65015)] to the heuristic dynamic programming could be found. Then, the authors introduce a concurrent learning to solve approximate optimal control problems under a finite excitation condition. The method is extended to solve trajectory tracking problems for uncertain systems and to find approximate feedback Nash solutions in non-zero sum differential games. The proposed reinforcement learning based procedures are applied to design feedback controllers for autonomous vehicles. The examples include station keeping of a marine craft and optimal control of path-following for a mobile robot. The last chapter of the book is devoted to computational considerations related to the proposed methodology of controllers design and analysis of control and decision making systems. The discussion is based on the theory of universal reproducing kernels.
    0 references
    optimal control
    0 references
    reinforcement learning
    0 references
    Lyapunov-based approach
    0 references
    approximate optimal controllers
    0 references
    approximate dynamic programming
    0 references
    HJB equations
    0 references
    differential games
    0 references
    autonomous vehicles
    0 references
    universal reproducing kernels
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references