Logarithmic regret bounds for continuous-time average-reward Markov decision processes
From MaRDI portal
Publication:6608781
DOI10.1137/23m1584101MaRDI QIDQ6608781
Publication date: 20 September 2024
Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)
average rewardcontinuous-time Markov decision processesstochastic comparisoninstance-dependent regret boundsupper confidence reinforcement learning
Markov and semi-Markov decision processes (90C40) Continuous-time Markov processes on discrete state spaces (60J27)
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- Continuous-time Markov decision processes. Theory and applications
- Technical Note—An Equivalence Between Continuous and Discrete Time Markov Decision Processes
- Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning
- Comparing counting processes and queues
- Optimal Adaptive Policies for Markov Decision Processes
- Queueing Network Controls via Deep Reinforcement Learning
- Continuous-Time Markov Decision Processes
- Bandits With Heavy Tail
- A Queueing Reward System with Several Customer Classes
- Reinforcement Learning for Linear-Convex Models with Jumps via Stability Analysis of Feedback Controls
- Optimal Scheduling of Entropy Regularizer for Continuous-Time Linear-Quadratic Reinforcement Learning
This page was built for publication: Logarithmic regret bounds for continuous-time average-reward Markov decision processes