Batch policy learning in average reward Markov decision processes

From MaRDI portal
Publication:2112817

DOI10.1214/22-AOS2231MaRDI QIDQ2112817

Zhengling Qi, Runzhe Wan, Peng Liao, Predrag Klasnja, Susan A. Murphy

Publication date: 12 January 2023

Published in: The Annals of Statistics (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/2007.11771




Related Items (3)


Uses Software


Cites Work


This page was built for publication: Batch policy learning in average reward Markov decision processes