Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach
From MaRDI portal
Publication:6053152
DOI10.1109/TAC.2021.3128592arXiv1912.09135OpenAlexW3213818741MaRDI QIDQ6053152
Runyu Zhang, Yujie Tang, Ying-Ying Li, Na Li
Publication date: 26 September 2023
Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)
Abstract: This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose a Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerance's inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Lastly, we numerically test ZODPO on multi-zone HVAC systems.
Full work available at URL: https://arxiv.org/abs/1912.09135
Related Items (5)
Analysis of the optimization landscape of Linear Quadratic Gaussian (LQG) control โฎ Byzantine-Resilient Decentralized Policy Evaluation With Linear Function Approximation โฎ Small-disturbance input-to-state stability of perturbed gradient flows: applications to LQR problem โฎ Learning decentralized linear quadratic regulators with \(\sqrt{T}\) regret โฎ System stabilization with policy optimization on unstable latent manifolds
Recommendations
- Unnamed Item ๐ ๐
- Distributed learning and cooperative control for multi-agent systems ๐ ๐
- Distributed point-to-point iterative learning control for multi-agent systems with quantization ๐ ๐
- Reinforcement learning for distributed control and multi-player games ๐ ๐
- Near Optimal LQR Performance in the Decentralized Setting ๐ ๐
- Decentralized iterative learning control methods for large scale linear dynamic systems ๐ ๐
- Distributed adaptive iterative learning control for nonlinear multiagent systems with state constraints ๐ ๐
- Multiagent Fully Decentralized Value Function Learning With Linear Convergence Rates ๐ ๐
- Efficient Learning of Distributed Linear-Quadratic Control Policies ๐ ๐
This page was built for publication: Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach