Rotting Infinitely Many-armed Bandits

Author name not available (Why is that?)

Publication date: 30 January 2022

Abstract: We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate

v a r r h o = o (1)

. We show that this learning problem has an

O m e g a (m a x v a r r h o^{1 / 3} T, s q r t T)

worst-case regret lower bound where

T

is the horizon time. We show that a matching upper bound

i l d e O (m a x v a r r h o^{1 / 3} T, s q r t T)

, up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate

v a r r h o

. We also show that an

i l d e O (m a x v a r r h o^{1 / 3} T, T^{3 / 4})

regret upper bound can be achieved by an algorithm that does not know the value of

v a r r h o

, by using an adaptive UCB index along with an adaptive threshold value.

Has companion code repository: https://github.com/junghunkim7786/rotting_infinite_armed_bandits

This page was built for publication: Rotting Infinitely Many-armed Bandits

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6389666)