scientific article
From MaRDI portal
Publication:3093369
zbMath1222.68381MaRDI QIDQ3093369
Publication date: 12 October 2011
Full work available at URL: http://www.jmlr.org/papers/v7/munos06b.html
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
optimal controlparametric optimizationgradient estimatesensitivity analysisreinforcement learninglikelihood ratio methodpolicy searchpathwise derivation
Learning and adaptive systems in artificial intelligence (68T05) Optimal stochastic control (93E20) Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20)
Related Items (5)
Unnamed Item ⋮ Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality ⋮ Linear Convergence of a Policy Gradient Method for Some Finite Horizon Continuous Time Control Problems ⋮ Unnamed Item ⋮ Inhomogeneous deep Q-network for time sensitive applications
This page was built for publication: