Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Batch policy learning in average reward Markov decision processes - MaRDI portal

Batch policy learning in average reward Markov decision processes (Q2112817)

From MaRDI portal

Jump to:navigation, search

This is the item page for this Wikibase entity, intended for internal use and editing purposes.

Please use this page instead for the normal view: Batch policy learning in average reward Markov decision processes

scientific article; zbMATH DE number 7641129

Language	Label	Description	Also known as
English	Batch policy learning in average reward Markov decision processes	scientific article; zbMATH DE number 7641129

Statements

scholarly article

0 references

Batch policy learning in average reward Markov decision processes (English)

0 references

0 references

0 references

0 references

Predrag Klasnja

0 references

Susan A. Murphy

0 references

The Annals of Statistics

0 references

publication date

12 January 2023

0 references

full work available at URL

https://arxiv.org/abs/2007.11771

0 references

https://projecteuclid.org/journals/annals-of-statistics/volume-50/issue-6/Batch-policy-learning-in-average-reward-Markov-decision-processes/10.1214/22-AOS2231.full

0 references

zbMATH Keywords

Markov decision process

0 references

average reward

0 references

policy optimization

0 references

doubly robust estimator

0 references

describes a project that uses

0 references

0 references

MaRDI profile type

0 references

Learning algorithms for Markov decision processes with average cost

0 references

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

0 references

0 references

Double/debiased machine learning for treatment and structural parameters

0 references

Doubly robust policy evaluation and optimization

0 references

0 references

Constructing dynamic treatment regimes over indefinite time horizons

0 references

Model selection in reinforcement learning

0 references

Regularized policy iteration with nonparametric function spaces

0 references

0 references

0 references

Dynamic treatment regimes: technical challenges and applications

0 references

10.1162/1532443041827907

0 references

Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health

0 references

On the limited memory BFGS method for large scale optimization

0 references

Statistical consistency and asymptotic normality for high-dimensional robust \(M\)-estimators

0 references

Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning

0 references

0 references

The landscape of empirical risk for nonconvex losses

0 references

0 references

Marginal Mean Models for Dynamic Regimes

0 references

Semiparametric efficiency bounds

0 references

Kernel-based reinforcement learning

0 references

0 references

Estimation of Regression Coefficients When Some Regressors Are Not Always Observed

0 references

Support Vector Machines

0 references

0 references

Asymptotic Statistics

0 references

Resampling‐based confidence intervals for model‐free robust inference on optimal treatment regimes

0 references

A Robust Method for Estimating Optimal Treatment Regimes

0 references

Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions

0 references

New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes

0 references

0 references

Identifiers

10.1214/22-AOS2231

0 references

Mathematics Subject Classification ID

0 references

zbMATH DE Number

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2112817

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q2112817&oldid=42175490"