The following pages link to (Q4617639):
Displaying 4 items.
- Near-optimal regret bounds for reinforcement learning (Q2896090) (← links)
- Mean-Variance Tradeoffs in an Undiscounted MDP (Q4287609) (← links)
- Temporal concatenation for Markov decision processes (Q5051192) (← links)
- Settling the sample complexity of model-based offline reinforcement learning (Q6192326) (← links)