AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity
From MaRDI portal
Publication:6310634
arXiv1812.00885MaRDI QIDQ6310634
Author name not available (Why is that?)
Publication date: 3 December 2018
Abstract: In this paper, we propose AsyncQVI, an asynchronous-parallel Q-value iteration for discounted Markov decision processes whose transition and reward can only be sampled through a generative model. Given such a problem with states, actions, and a discounted factor , AsyncQVI uses memory of size and returns an -optimal policy with probability at least using ilde{mathcal{O}}�ig(frac{|mathcal{S}||mathcal{A}|}{(1-gamma)^5varepsilon^2}log(frac{1}{delta})�ig) samples. AsyncQVI is also the first asynchronous-parallel algorithm for discounted Markov decision processes that has a sample complexity, which nearly matches the theoretical lower bound. The relatively low memory footprint and parallel ability make AsyncQVI suitable for large-scale applications. In numerical tests, we compare AsyncQVI with four sample-based value iteration methods. The results show that our algorithm is highly efficient and achieves linear parallel speedup.
Has companion code repository: https://github.com/uclaopt/AsyncQVI
This page was built for publication: AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6310634)