scientific article; zbMATH DE number 7014219
From MaRDI portal
Publication:4617639
zbMath1405.68307arXiv1803.01626MaRDI QIDQ4617639
Mohammad Talebi, Odalric-Ambrym Maillard
Publication date: 6 February 2019
Full work available at URL: https://arxiv.org/abs/1803.01626
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Markov decision processesconcentration inequalitiesregret minimizationundiscounted reinforcement learningBellman optimality
Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)
Related Items (2)
Temporal concatenation for Markov decision processes ⋮ Settling the sample complexity of model-based offline reinforcement learning
This page was built for publication: