Statistical Inference for Online Decision Making: In a Contextual Bandit Setting
From MaRDI portal
Publication:5857145
DOI10.1080/01621459.2020.1770098zbMath1457.62041arXiv2010.07283OpenAlexW3030165768MaRDI QIDQ5857145
Haoyu Chen, Rui Song, Wen-Bin Lu
Publication date: 30 March 2021
Published in: Journal of the American Statistical Association (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2010.07283
statistical inferencemodel misspecificationonline decision makingepsilon-greedyinverse propensity weighted estimator
Nonparametric estimation (62G05) Sequential statistical analysis (62L10) Compound decision problems in statistical decision theory (62C25)
Related Items
A Single-Index Model With a Surface-Link for Optimizing Individualized Dose Rules, Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
Cites Work
- Unnamed Item
- Unnamed Item
- The multi-armed bandit problem with covariates
- Targeted sequential design for targeted learning inference of the optimal treatment rule and its mean reward
- Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
- On the Allocation of Treatments in Sequential Medical Trials
- The Search for Optimality in Clinical Trials
- A One-Armed Bandit Problem with a Concomitant Variable
- Using Least Squares to Approximate Unknown Regression Functions
- Concordance-Assisted Learning for Estimating Optimal Individualized Treatment Regimes
- Estimating Individualized Treatment Rules Using Outcome Weighted Learning
- Optimal Dynamic Treatment Regimes
- 10.1162/153244303321897663
- A Robust Method for Estimating Optimal Treatment Regimes
- Online Decision Making with High-Dimensional Covariates
- A linear response bandit problem
- Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions
- Randomized allocation with arm elimination in a bandit problem with covariates