Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

arXiv2002.07258MaRDI QIDQ6335008

Author name not available (Why is that?)

Publication date: 17 February 2020

Abstract: We consider combinatorial semi-bandits over a set of arms

c a l X s u b s e t {0, 1}^{d}

where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound

R (T) = c a l O B i g (d (l n m)^{2} (l n T) o v e r D e l t a_{m i n} B i g)

, but it has computational complexity

c a l O (| c a l X |)

which is typically exponential in

d

, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret

R (T) = c a l O B i g (d (l n m)^{2} (l n T) o v e r D e l t a_{m i n} B i g)

and computational complexity

. Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time

by repeatedly maximizing a linear function over

c a l X

subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.

Has companion code repository: https://github.com/dourouc05/CombinatorialBandits.jl

This page was built for publication: Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6335008)