Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits (Q6371461)

From MaRDI portal





preprint article from arXiv
Language Label Description Also known as
English
Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits
preprint article from arXiv

    Statements

    28 June 2021
    0 references
    stat.ML
    0 references
    cs.AI
    0 references
    cs.IT
    0 references
    cs.LG
    0 references
    cs.RO
    0 references
    math.IT
    0 references
    Wenshuo Guo
    0 references
    Kumar Krishna Agrawal
    0 references
    Aditya Grover
    0 references
    Vidya Muthukumar
    0 references
    Ashwin Pananjady
    0 references

    Identifiers

    0 references