Geometry of policy improvement
From MaRDI portal
Publication:1689145
DOI10.1007/978-3-319-68445-1_33zbMath1426.91076arXiv1704.01785OpenAlexW2606500941MaRDI QIDQ1689145
Publication date: 12 January 2018
Full work available at URL: https://arxiv.org/abs/1704.01785
reinforcement learningpartially observable Markov decision processmemoryless stochastic policypolicy gradient theorem
Related Items (1)
This page was built for publication: Geometry of policy improvement