Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities
From MaRDI portal
Publication:5265786
DOI10.1080/17442508.2014.939979zbMath1317.90317OpenAlexW2011519669MaRDI QIDQ5265786
Tomás Prieto-Rumeau, François Dufour
Publication date: 29 July 2015
Published in: Stochastics (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1080/17442508.2014.939979
Markov decision processeslong-run average costWasserstein distanceconcentration inequalitiesapproximation of the optimal value and an optimal policy
Related Items
A convex optimization approach to dynamic programming in continuous state and action spaces ⋮ Computable approximations for continuous-time Markov decision processes on Borel spaces based on empirical measures ⋮ From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming ⋮ Optimal deterministic controller synthesis from steady-state distributions ⋮ Approximation of discounted minimax Markov control problems and zero-sum Markov games using Hausdorff and Wasserstein distances ⋮ Computable approximations for average Markov decision processes in continuous time ⋮ A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs ⋮ Robustness to incorrect models and data-driven learning in average-cost optimal stochastic control ⋮ A stability result for linear Markovian stochastic optimization problems
Cites Work
- Unnamed Item
- Simple bounds for the convergence of empirical and occupation measures in 1-Wasserstein distance
- Approximation of Markov decision processes with general state space
- Average cost Markov control processes: Stability with respect to the Kantorovich metric
- Policy iteration for average cost Markov control processes on Borel spaces
- Approximate receding horizon approach for Markov decision processes: average reward case
- A time aggregation approach to Markov decision processes
- Approximate gradient methods in policy-space optimization of Markov reward processes
- A policy improvement method for constrained average Markov decision processes
- Learning Algorithms for Markov Decision Processes with Average Cost
- Finite Linear Programming Approximations of Constrained Discounted Markov Decision Processes
- Average optimality for Markov decision processes in borel spaces: a new condition and approach
- OnActor-Critic Algorithms
- CONVERGENCE OF SIMULATION-BASED POLICY ITERATION
- Simulation-based optimization of Markov reward processes
- Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey
- Convergence Results for Some Temporal Difference Methods Based on Least Squares
- Universal Reinforcement Learning
- Approximate Dynamic Programming