Gradient temporal-difference learning for off-policy evaluation using emphatic weightings (Q6146179)

scientific article; zbMATH DE number 7786202

Language	Label	Description	Also known as
English	Gradient temporal-difference learning for off-policy evaluation using emphatic weightings	scientific article; zbMATH DE number 7786202

Statements

instance of

scholarly article

0 references

title

Gradient temporal-difference learning for off-policy evaluation using emphatic weightings (English)

0 references

DOI

10.1016/j.ins.2021.08.082

0 references

0 references

0 references

0 references

0 references

0 references

0 references

10 January 2024

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

reinforcement learning

0 references

off-policy evaluation

0 references

temporal-difference learning

0 references

gradient temporal-difference learning

0 references

emphatic approach

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1016/j.ins.2021.08.082

0 references

0 references

0 references

Recruitment-imitation mechanism for evolutionary reinforcement learning

0 references

Marginal Mean Models for Dynamic Regimes

0 references

Q2934010

0 references

Q2810885

0 references

\({\mathcal Q}\)-learning

0 references

Q2953645

0 references

An analysis of temporal-difference learning with function approximation

0 references

The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:6146179