Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning
From MaRDI portal
Publication:5089723
DOI10.1137/21M140924XOpenAlexW3202341388MaRDI QIDQ5089723
No author found.
Publication date: 15 July 2022
Published in: SIAM Journal on Mathematics of Data Science (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2110.00871
Computational learning theory (68Q32) Learning and adaptive systems in artificial intelligence (68T05)
Cites Work
- Unnamed Item
- Unnamed Item
- A decision-theoretic generalization of on-line learning and an application to boosting
- Information-theoretic determination of minimax rates of convergence
- The Nonstochastic Multiarmed Bandit Problem
- 10.1162/153244303321897663
- Competitive On-line Statistics
- Learning to Optimize via Posterior Sampling
- Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability
This page was built for publication: Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning