Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
From MaRDI portal
Publication:3529915
DOI10.1007/978-3-540-87987-9_14zbMath1156.90456OpenAlexW2181151646MaRDI QIDQ3529915
Publication date: 14 October 2008
Published in: Lecture Notes in Computer Science (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/978-3-540-87987-9_14
Stopping times; optimal stopping problems; gambling theory (60G40) Markov and semi-Markov decision processes (90C40)
Related Items (1)
Cites Work
- Unnamed Item
- Unnamed Item
- Asymptotically efficient adaptive allocation rules
- A characterization of the minimum cycle mean in a digraph
- Markov chain sensitivity measured by mean first passage times
- Near-optimal reinforcement learning in polynomial time
- Optimal learning and experimentation in bandit problems.
- Mixing times with applications to perturbed Markov chains
- Finding minimum cost to time ratio cycles with small integral transit times
- Pseudometrics for State Aggregation in Average Reward Markov Decision Processes
- Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost
- Optimal Adaptive Policies for Markov Decision Processes
- The Nonstochastic Multiarmed Bandit Problem
- Improved Rates for the Stochastic Continuum-Armed Bandit Problem
- Faster parametric shortest path and minimum‐balance algorithms
- Finite-time analysis of the multiarmed bandit problem
This page was built for publication: Online Regret Bounds for Markov Decision Processes with Deterministic Transitions