Logarithmic regret bounds for continuous-time average-reward Markov decision processes (Q6608781)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Logarithmic regret bounds for continuous-time average-reward Markov decision processes |
scientific article; zbMATH DE number 7916661
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Logarithmic regret bounds for continuous-time average-reward Markov decision processes |
scientific article; zbMATH DE number 7916661 |
Statements
Logarithmic regret bounds for continuous-time average-reward Markov decision processes (English)
0 references
20 September 2024
0 references
continuous-time Markov decision processes
0 references
average reward
0 references
instance-dependent regret bounds
0 references
upper confidence reinforcement learning
0 references
stochastic comparison
0 references
0 references
0 references
0 references