The following pages link to (Q5396640):
Displaying 13 items.
- Doubly robust policy evaluation and optimization (Q252797) (← links)
- Corruption-tolerant bandit learning (Q669323) (← links)
- Extracting certainty from uncertainty: regret bounded by variation in costs (Q1959595) (← links)
- Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm (Q2091834) (← links)
- Ballooning multi-armed bandits (Q2238588) (← links)
- Non-stationary stochastic optimization (Q2795881) (← links)
- Truthful mechanisms with implicit payment computation (Q2796397) (← links)
- Learning Theory (Q4680919) (← links)
- (Q4998863) (← links)
- Optimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards (Q5113912) (← links)
- Bandits with Global Convex Constraints and Objective (Q5129206) (← links)
- AN ONLINE PORTFOLIO SELECTION ALGORITHM WITH REGRET LOGARITHMIC IN PRICE VARIATION (Q5247422) (← links)
- Relaxing the i.i.d. assumption: adaptively minimax optimal regret via root-entropic regularization (Q6183761) (← links)