scientific article; zbMATH DE number 6276207
zbMath1433.68361MaRDI QIDQ5405216
Mohammad Ghavamzadeh, Rémi Munos, Alessandro Lazaric
Publication date: 1 April 2014
Full work available at URL: http://www.jmlr.org/papers/v13/lazaric12a.html
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Markov decision processesreinforcement learningleast-squares policy iterationgeneralization boundsfinite-sample analysisleast-squares temporal-difference
Linear regression; mixed models (62J05) Learning and adaptive systems in artificial intelligence (68T05) Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Markov and semi-Markov decision processes (90C40)
Related Items (7)
This page was built for publication: