Truncated LinUCB for Stochastic Linear Bandits
From MaRDI portal
Publication:6391995
arXiv2202.11735MaRDI QIDQ6391995
Author name not available (Why is that?)
Publication date: 23 February 2022
Abstract: This paper considers contextual bandits with a finite number of arms, where the contexts are independent and identically distributed -dimensional random vectors, and the expected rewards are linear in both the arm parameters and contexts. The LinUCB algorithm, which is near minimax optimal for related linear bandits, is shown to have a cumulative regret that is suboptimal in both the dimension and time horizon , due to its over-exploration. A truncated version of LinUCB is proposed and termed "Tr-LinUCB", which follows LinUCB up to a truncation time and performs pure exploitation afterwards. The Tr-LinUCB algorithm is shown to achieve regret if for a sufficiently large constant , and a matching lower bound is established, which shows the rate optimality of Tr-LinUCB in both and under a low dimensional regime. Further, if for some , the loss compared to the optimal is a multiplicative factor, which does not depend on . This insensitivity to overshooting in choosing the truncation time of Tr-LinUCB is of practical importance.
Has companion code repository: https://github.com/simonzhou86/tr_linucb
This page was built for publication: Truncated LinUCB for Stochastic Linear Bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6391995)