scientific article
From MaRDI portal
Publication:2896031
zbMath1242.68217MaRDI QIDQ2896031
Publication date: 13 July 2012
Full work available at URL: http://www.jmlr.org/papers/v11/dicastro10a.html
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Learning and adaptive systems in artificial intelligence (68T05) Stopping times; optimal stopping problems; gambling theory (60G40) Online algorithms; streaming algorithms (68W27)
Related Items (3)
A Small Gain Analysis of Single Timescale Actor Critic ⋮ On the sample complexity of actor-critic method for reinforcement learning with function approximation ⋮ Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies
This page was built for publication: