Batch policy learning in average reward Markov decision processes
From MaRDI portal
Publication:2112817
DOI10.1214/22-AOS2231MaRDI QIDQ2112817
Zhengling Qi, Runzhe Wan, Peng Liao, Predrag Klasnja, Susan A. Murphy
Publication date: 12 January 2023
Published in: The Annals of Statistics (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2007.11771
Related Items (3)
A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets ⋮ Off-policy evaluation in partially observed Markov decision processes under sequential ignorability ⋮ Projected state-action balancing weights for offline reinforcement learning
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Doubly robust policy evaluation and optimization
- Dynamic treatment regimes: technical challenges and applications
- Model selection in reinforcement learning
- On the limited memory BFGS method for large scale optimization
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- Kernel-based reinforcement learning
- The landscape of empirical risk for nonconvex losses
- Statistical consistency and asymptotic normality for high-dimensional robust \(M\)-estimators
- Learning Algorithms for Markov Decision Processes with Average Cost
- Semiparametric efficiency bounds
- Support Vector Machines
- Asymptotic Statistics
- Estimation of Regression Coefficients When Some Regressors Are Not Always Observed
- Marginal Mean Models for Dynamic Regimes
- Constructing dynamic treatment regimes over indefinite time horizons
- 10.1162/1532443041827907
- A Robust Method for Estimating Optimal Treatment Regimes
- Double/debiased machine learning for treatment and structural parameters
- Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning
- New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes
- Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions
- Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health
- Resampling‐based confidence intervals for model‐free robust inference on optimal treatment regimes
This page was built for publication: Batch policy learning in average reward Markov decision processes