The policy iteration algorithm for average reward Markov decision processes with general state space
From MaRDI portal
Publication:4395828
DOI10.1109/9.650016zbMath0906.93063OpenAlexW2115558605WikidataQ114991401 ScholiaQ114991401MaRDI QIDQ4395828
Publication date: 12 August 1998
Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1109/9.650016
optimal controlqueueing networksdeterministic routingcontrolled Markov chainsHoward's policy iteration algorithm
Stochastic network models in operations research (90B15) Optimal stochastic control (93E20) Applications of Markov renewal processes (reliability, queueing networks, etc.) (60K20)
Related Items
A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications ⋮ Potential-based least-squares policy iteration for a parameterized feedback control system ⋮ Unnamed Item ⋮ The policy iteration algorithm for a compound Poisson process applied to optimal dividend strategies under a Cramér-Lundberg risk model ⋮ An optimal control approach to day-to-day congestion pricing for stochastic transportation networks ⋮ The policy iteration algorithm for average continuous control of piecewise deterministic Markov processes ⋮ A note on the existence of optimal stationary policies for average Markov decision processes with countable states ⋮ Optimal Inventory Control with Jump Diffusion and Nonlinear Dynamics in the Demand ⋮ Stochastic control via direct comparison ⋮ On Iteration Improvement for Averaged Expected Cost Control for One-Dimensional Ergodic Diffusions ⋮ Weak convergence and fluid limits in optimal time-to-empty queueing control problems ⋮ Approximate receding horizon approach for Markov decision processes: average reward case ⋮ Completion-of-squares: revisited and extended ⋮ Average control of Markov decision processes with Feller transition probabilities and general action spaces ⋮ Average Cost Optimality Inequality for Markov Decision Processes with Borel Spaces and Universally Measurable Policies ⋮ Weakly coupled event triggered output feedback system in wireless networked control systems ⋮ A policy improvement method for constrained average Markov decision processes ⋮ Planning for the long run: programming with patient, Pareto responsive preferences ⋮ Policy iteration for continuous-time average reward Markov decision processes in Polish spaces ⋮ Coding and control for communication networks ⋮ Reliability by design in distributed power transmission networks ⋮ On the Minimum Pair Approach for Average Cost Markov Decision Processes with Countable Discrete Action Spaces and Strictly Unbounded Costs ⋮ On structural properties of optimal average cost functions in Markov decision processes with Borel spaces and universally measurable policies ⋮ Dispatching to parallel servers. Solutions of Poisson's equation for first-policy improvement ⋮ Single sample path-based optimization of Markov chains ⋮ Dynamic load balancing in parallel queueing systems: stability and optimal control ⋮ Dynamic safety-stocks for asymptotic optimality in stochastic networks