Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes (Q2687069)

scientific article; zbMATH DE number 7658270

Language	Label	Description	Also known as
English	Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes	scientific article; zbMATH DE number 7658270

Statements

instance of

scholarly article

0 references

title

Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes (English)

0 references

author

Guanghui Lan

0 references

published in

Mathematical Programming. Series A. Series B

0 references

publication date

1 March 2023

0 references

full work available at URL

https://arxiv.org/abs/2102.00135

0 references

MaRDI profile type

Publication

0 references

cites work

Functional Approximations and Dynamic Programming

0 references

On the convergence properties of non-Euclidean extragradient methods for variational inequalities with generalized monotone operators

0 references

Online Markov Decision Processes

0 references

Finite-Dimensional Variational Inequalities and Complementarity Problems

0 references

First-order and stochastic optimization methods for machine learning

0 references

Robust Stochastic Approximation Approach to Stochastic Programming

0 references

Q3967358

0 references

Q4315289

0 references

High-Dimensional Probability

0 references

Identifiers

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

10.1007/S10107-022-01816-5

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2687069