A dynamic programming strategy to balance exploration and exploitation in the bandit problem

From MaRDI portal
Publication:647433