scientific article; zbMATH DE number 7370614
From MaRDI portal
Publication:4999027
Alberto Maria Metelli, Daniele Calandriello, Matteo Pirotta, Marcello Restelli
Publication date: 9 July 2021
Full work available at URL: https://jmlr.csail.mit.edu/papers/v22/19-707.html
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Markov decision processreinforcement learningapproximate dynamic programmingapproximate policy iterationpolicy chatteringpolicy oscillation
Related Items (2)
Smoothing policies and safe policy gradients ⋮ Efficient reductions in cyclotomic rings -- application to Ring LWE based FHE schemes
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- On the convergence of Neumann series in Banach space
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
- Approximate policy iteration: a survey and some new methods
- The Linear Programming Approach to Approximate Dynamic Programming
- 10.1162/1532443041827907
- Perturbation bounds for the stationary probabilities of a finite Markov chain
- Information Theory
This page was built for publication: