Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes

From MaRDI portal
Publication:2687069