Regret Bounds for Reinforcement Learning via Markov Chain Concentration
From MaRDI portal
Publication:5214808
DOI10.1613/jair.1.11316zbMath1442.68198arXiv1808.01813OpenAlexW3196847620MaRDI QIDQ5214808
Publication date: 5 February 2020
Published in: Journal of Artificial Intelligence Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1808.01813
Inequalities; stochastic orderings (60E15) Learning and adaptive systems in artificial intelligence (68T05) Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20)
Related Items (2)
EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET ⋮ Improved estimation of relaxation time in nonreversible Markov chains
This page was built for publication: Regret Bounds for Reinforcement Learning via Markov Chain Concentration