Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
scientific article; zbMATH DE number 6276207 - MaRDI portal

scientific article; zbMATH DE number 6276207

From MaRDI portal

Publication:5405216

Jump to:navigation, search

zbMath1433.68361MaRDI QIDQ5405216

Mohammad Ghavamzadeh, Rémi Munos, Alessandro Lazaric

Publication date: 1 April 2014

Full work available at URL: http://www.jmlr.org/papers/v13/lazaric12a.html

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

Markov decision processes reinforcement learning least-squares policy iteration generalization bounds finite-sample analysis least-squares temporal-difference

Mathematics Subject Classification ID

Linear regression; mixed models (62J05) Learning and adaptive systems in artificial intelligence (68T05) Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Markov and semi-Markov decision processes (90C40)

Related Items (7)

Policy space identification in configurable environments ⋮ Batch mode reinforcement learning based on the synthesis of artificial trajectories ⋮ A concentration bound for \(\operatorname{LSPE}( \lambda )\) ⋮ Offline reinforcement learning with task hierarchies ⋮ A Q-learning predictive control scheme with guaranteed stability ⋮ Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling ⋮ Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics

This page was built for publication:

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5405216&oldid=20145131"