Rotting Infinitely Many-armed Bandits
From MaRDI portal
Publication:6389666
arXiv2201.12975MaRDI QIDQ6389666
Author name not available (Why is that?)
Publication date: 30 January 2022
Abstract: We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate . We show that this learning problem has an worst-case regret lower bound where is the horizon time. We show that a matching upper bound , up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate . We also show that an regret upper bound can be achieved by an algorithm that does not know the value of , by using an adaptive UCB index along with an adaptive threshold value.
Has companion code repository: https://github.com/junghunkim7786/rotting_infinite_armed_bandits
This page was built for publication: Rotting Infinitely Many-armed Bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6389666)