Learning algorithms for Markov decision processes with average cost (Q2753225)

From MaRDI portal





scientific article; zbMATH DE number 1667501
Language Label Description Also known as
English
Learning algorithms for Markov decision processes with average cost
scientific article; zbMATH DE number 1667501

    Statements

    0 references
    0 references
    0 references
    29 October 2001
    0 references
    relative value iteration algorithm
    0 references
    stability
    0 references
    Learning algorithms for Markov decision processes with average cost (English)
    0 references
    The authors propose and give for the first time a rigorous convergence analysis of two \(Q\)-learning algorithms for the optimal average cost control problem. One is a stochastic approximation analogue of the original relative value iteration (RVI) algorithm, another is a stochastic approximation analogue of a value iteration algorithm based on the stochastic shortest path (SSP) formulation of the problem [cf. \textit{D. Bertsekas} and \textit{J. N. Tsitsiklis}, Neuro-dynamic programming, Athena Scientific, Belmont, MA (1996; Zbl 0924.68163); \textit{V. S. Borkar} and \textit{S. P. Meyn}, SIAM J. Control Optimization 38, 447-469 (2000; Zbl 0990.62071)]. Both synchronous and asynchronous implementations are considered and analyzed here via studying the a.s. boundedness of \(\{Q^n\}\) and the stability behavior of the associated ODE for the \(Q\)-factor. Meaningfully, it is pointed out that due to a large state space, the use of acceleration techniques is worth exploring, while in reinforcement learning, the convergence rate is usually slow. Moreover, the case of an infinite state space is still an open issue. These may be important directions for this area in the future.
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references