The following pages link to Block Policy Mirror Descent (Q6093281):
Displaying 3 items.
- Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes (Q2687069) (← links)
- Softmax policy gradient methods can take exponential time to converge (Q6110457) (← links)
- Policy mirror descent inherently explores action space (Q6663113) (← links)