A Generalized Stacked Reinforcement Learning Method for Sampled Systems

From MaRDI portal
Publication:6198424

DOI10.1109/TAC.2023.3250032arXiv2108.10392OpenAlexW3195923278MaRDI QIDQ6198424

Unnamed Author, Pavel Osinenko, Unnamed Author, Unnamed Author

Publication date: 22 February 2024

Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)

Abstract: A common setting of reinforcement learning (RL) is a Markov decision process (MDP) in which the environment is a stochastic discrete-time dynamical system. Whereas MDPs are suitable in such applications as video-games or puzzles, physical systems are time-continuous. A general variant of RL is of digital format, where updates of the value (or cost) and policy are performed at discrete moments in time. The agent-environment loop then amounts to a sampled system, whereby sample-and-hold is a specific case. In this paper, we propose and benchmark two RL methods suitable for sampled systems. Specifically, we hybridize model-predictive control (MPC) with critics learning the optimal Q- and value (or cost-to-go) function. Optimality is analyzed and performance comparison is done in an experimental case study with a mobile robot.


Full work available at URL: https://arxiv.org/abs/2108.10392










This page was built for publication: A Generalized Stacked Reinforcement Learning Method for Sampled Systems