AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity

Author name not available (Why is that?)

Publication date: 3 December 2018

Abstract: In this paper, we propose AsyncQVI, an asynchronous-parallel Q-value iteration for discounted Markov decision processes whose transition and reward can only be sampled through a generative model. Given such a problem with

| m a t h c a l S |

states,

| m a t h c a l A |

actions, and a discounted factor

g a m m a i n (0, 1)

, AsyncQVI uses memory of size

m a t h c a l O (| m a t h c a l S |)

and returns an

v a r e p s i l o n

-optimal policy with probability at least

1 - d e l t a

using

ilde{mathcal{O}}�ig(frac{|mathcal{S}||mathcal{A}|}{(1-gamma)^5varepsilon^2}log(frac{1}{delta})�ig)

samples. AsyncQVI is also the first asynchronous-parallel algorithm for discounted Markov decision processes that has a sample complexity, which nearly matches the theoretical lower bound. The relatively low memory footprint and parallel ability make AsyncQVI suitable for large-scale applications. In numerical tests, we compare AsyncQVI with four sample-based value iteration methods. The results show that our algorithm is highly efficient and achieves linear parallel speedup.

Has companion code repository: https://github.com/uclaopt/AsyncQVI

This page was built for publication: AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6310634)