Batch policy learning in average reward Markov decision processes (Q2112817)

From MaRDI portal





scientific article; zbMATH DE number 7641129
Language Label Description Also known as
English
Batch policy learning in average reward Markov decision processes
scientific article; zbMATH DE number 7641129

    Statements

    Batch policy learning in average reward Markov decision processes (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    12 January 2023
    0 references
    Markov decision process
    0 references
    average reward
    0 references
    policy optimization
    0 references
    doubly robust estimator
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers