Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards

From MaRDI portal
Publication:2006767