Off-policy evaluation in partially observed Markov decision processes under sequential ignorability
From MaRDI portal
Publication:6183750
DOI10.1214/23-aos2287arXiv2110.12343MaRDI QIDQ6183750
Publication date: 4 January 2024
Published in: The Annals of Statistics (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2110.12343
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Planning and acting in partially observable stochastic domains
- Performance guarantees for individualized treatment rules
- From local kernel to nonlocal multiple-model image denoising
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- On spatially adaptive estimation of nonparametric regression
- Ambiguous partially observable Markov decision processes: structural results and applications
- Unified methods for censored longitudinal data and causality
- Batch policy learning in average reward Markov decision processes
- Weighted sums of certain dependent random variables
- A CENTRAL LIMIT THEOREM AND A STRONG MIXING CONDITION
- State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms
- The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs
- The Optimal Control of Partially Observable Markov Processes over a Finite Horizon
- Marginal Structural Models to Estimate the Joint Causal Effect of Nonrandomized Treatments
- Asymptotic Statistics
- Identifying causal effects with proxy variables of an unmeasured confounder
- Spatially Adaptive Estimation via Fitted Local Likelihood Techniques
- Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice
- Estimating Individualized Treatment Rules Using Outcome Weighted Learning
- Optimal Dynamic Treatment Regimes
- A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect
- Policy Learning With Observational Data
- Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
- Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning
- Universal Reinforcement Learning
- Optimal Structural Nested Models for Optimal Sequential Decisions
- Probability Inequalities for Sums of Bounded Random Variables
- An alternative point of view on Lepski's method
- Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions
- Learning When-to-Treat Policies
- Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health