Decentralized Heterogeneous Multi-Player Multi-Armed Bandits With Non-Zero Rewards on Collisions
From MaRDI portal
Publication:5088408
DOI10.1109/TIT.2021.3136095zbMATH Open1497.91063arXiv1910.09089MaRDI QIDQ5088408
Akshayaa Magesh, Venugopal V. Veeravalli
Publication date: 13 July 2022
Published in: IEEE Transactions on Information Theory (Search for Journal in Brave)
Abstract: We consider a fully decentralized multi-player stochastic multi-armed bandit setting where the players cannot communicate with each other and can observe only their own actions and rewards. The environment may appear differently to different players, , the reward distributions for a given arm are heterogeneous across players. In the case of a collision (when more than one player plays the same arm), we allow for the colliding players to receive non-zero rewards. The time-horizon for which the arms are played is emph{not} known to the players. Within this setup, where the number of players is allowed to be greater than the number of arms, we present a policy that achieves near order-optimal expected regret of order for some over a time-horizon of duration . This paper is accepted at IEEE Transactions on Information Theory.
Full work available at URL: https://arxiv.org/abs/1910.09089
Computational methods for problems pertaining to statistics (62-08) Probabilistic models, generic numerical methods in probability and statistics (65C20) (n)-person games, (n>2) (91A06) Statistical decision theory (62C99) Probabilistic games; gambling (91A60)
Related Items (3)
Multiplayer Bandits Without Observing Collision Information ⋮ Decentralized Learning for Multiplayer Multiarmed Bandits ⋮ Distributed learning in congested environments with partial information
This page was built for publication: Decentralized Heterogeneous Multi-Player Multi-Armed Bandits With Non-Zero Rewards on Collisions