Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability
From MaRDI portal
Publication:5868941
DOI10.1287/moor.2021.1193OpenAlexW3013725403MaRDI QIDQ5868941
Publication date: 26 September 2022
Published in: Mathematics of Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2003.12699
computational efficiencystatistical learningcontextual banditoffline regressiononline-to-offline reduction
Computational learning theory (68Q32) General nonlinear regression (62J02) Learning and adaptive systems in artificial intelligence (68T05) Sequential statistical design (62L05)
Related Items (3)
Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning ⋮ Dealing with expert bias in collective decision-making ⋮ Recent advances in reinforcement learning in finance
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Batched bandit problems
- Empirical entropy, minimax regret and minimax risk
- Information-theoretic determination of minimax rates of convergence
- Reinforcement learning with immediate rewards and linear hypotheses
- Cryptographic hardness for learning intersections of halfspaces
- Regret Minimization for Reserve Prices in Second-Price Auctions
- A Tutorial on Thompson Sampling
- The Nonstochastic Multiarmed Bandit Problem
- Deep Neural Networks for Estimation and Inference
- Bandit Algorithms
- Online Decision Making with High-Dimensional Covariates
- Introduction to Multi-Armed Bandits
- The computational power of optimization in online learning
This page was built for publication: Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability