Performance Bounds in $L_p$‐norm for Approximate Value Iteration
From MaRDI portal
Publication:5453575
DOI10.1137/040614384zbMath1356.90159OpenAlexW2012547817MaRDI QIDQ5453575
Publication date: 3 April 2008
Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1137/040614384
optimal controldynamic programmingMarkov decision processeserror analysisreinforcement learningfunction approximationstatistical learning
Dynamic programming in optimal control and differential games (49L20) Approximation methods and heuristics in mathematical programming (90C59) Optimal stochastic control (93E20) Markov and semi-Markov decision processes (90C40)
Related Items (7)
A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning ⋮ A perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces ⋮ Transfer learning for contextual multi-armed bandits ⋮ Settling the sample complexity of model-based offline reinforcement learning ⋮ Quadratic approximate dynamic programming for input‐affine systems ⋮ A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms
This page was built for publication: Performance Bounds in $L_p$‐norm for Approximate Value Iteration