Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits (Q6371461)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits |
preprint article from arXiv
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits |
preprint article from arXiv |
Statements
28 June 2021
0 references
stat.ML
0 references
cs.AI
0 references
cs.IT
0 references
cs.LG
0 references
cs.RO
0 references
math.IT
0 references
Wenshuo Guo
0 references
Kumar Krishna Agrawal
0 references
Aditya Grover
0 references
Vidya Muthukumar
0 references
Ashwin Pananjady
0 references