Lookahead-Bounded Q-Learning

arXiv2006.15690MaRDI QIDQ6343917

Author name not available (Why is that?)

Publication date: 28 June 2020

Abstract: We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new, provably convergent variant of Q-learning that seeks to improve the performance of standard Q-learning in stochastic environments through the use of ``lookahead upper and lower bounds. To do this, LBQL employs previously collected experience and each iteration's state-action values as dual feasible penalties to construct a sequence of sampled information relaxation problems. The solutions to these problems provide estimated upper and lower bounds on the optimal value, which we track via stochastic approximation. These quantities are then used to constrain the iterates to stay within the bounds at every iteration. Numerical experiments on benchmark problems show that LBQL exhibits faster convergence and more robustness to hyperparameters when compared to standard Q-learning and several related techniques. Our approach is particularly appealing in problems that require expensive simulations or real-world interactions.

Has companion code repository: https://github.com/ibrahim-elshar/LBQL_ICML2020

This page was built for publication: Lookahead-Bounded Q-Learning

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6343917)