AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity

From MaRDI portal
Publication:6310634

arXiv1812.00885MaRDI QIDQ6310634

Author name not available (Why is that?)

Publication date: 3 December 2018

Abstract: In this paper, we propose AsyncQVI, an asynchronous-parallel Q-value iteration for discounted Markov decision processes whose transition and reward can only be sampled through a generative model. Given such a problem with |mathcalS| states, |mathcalA| actions, and a discounted factor gammain(0,1), AsyncQVI uses memory of size mathcalO(|mathcalS|) and returns an varepsilon-optimal policy with probability at least 1delta using ilde{mathcal{O}}�ig(frac{|mathcal{S}||mathcal{A}|}{(1-gamma)^5varepsilon^2}log(frac{1}{delta})�ig) samples. AsyncQVI is also the first asynchronous-parallel algorithm for discounted Markov decision processes that has a sample complexity, which nearly matches the theoretical lower bound. The relatively low memory footprint and parallel ability make AsyncQVI suitable for large-scale applications. In numerical tests, we compare AsyncQVI with four sample-based value iteration methods. The results show that our algorithm is highly efficient and achieves linear parallel speedup.




Has companion code repository: https://github.com/uclaopt/AsyncQVI








This page was built for publication: AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6310634)