Reinforcement learning for long-run average cost.

From MaRDI portal

Publication:1427588

Jump to:navigation, search

DOI10.1016/S0377-2217(02)00874-3zbMath1102.90374MaRDI QIDQ1427588

Abhijit Gosavi

Publication date: 14 March 2004

Published in: European Journal of Operational Research (Search for Journal in Brave)

zbMATH Keywords

Stochastic processes Reinforcement learning Two time scales

Mathematics Subject Classification ID

Markov and semi-Markov decision processes (90C40)

Related Items

A reinforcement-learning approach for admission control in distributed network service systems, A policy gradient method for semi-Markov decision processes with application to call admission control, Reinforcement learning for joint pricing, lead-time and scheduling decisions in make-to-order systems, Reinforcement learning algorithms with function approximation: recent advances and applications, Approximate dynamic programming for capacity allocation in the service industry, Application of reinforcement learning to the game of Othello, Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning, A performance-centred approach to optimising maintenance of complex systems, Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration

Cites Work

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1427588&oldid=13597258"