Rotting Infinitely Many-armed Bandits

From MaRDI portal
Publication:6389666

arXiv2201.12975MaRDI QIDQ6389666

Author name not available (Why is that?)

Publication date: 30 January 2022

Abstract: We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate varrho=o(1). We show that this learning problem has an Omega(maxvarrho1/3T,sqrtT) worst-case regret lower bound where T is the horizon time. We show that a matching upper bound ildeO(maxvarrho1/3T,sqrtT), up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate varrho. We also show that an ildeO(maxvarrho1/3T,T3/4) regret upper bound can be achieved by an algorithm that does not know the value of varrho, by using an adaptive UCB index along with an adaptive threshold value.




Has companion code repository: https://github.com/junghunkim7786/rotting_infinite_armed_bandits








This page was built for publication: Rotting Infinitely Many-armed Bandits

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6389666)