Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
On-line policy gradient estimation with multi-step sampling - MaRDI portal

On-line policy gradient estimation with multi-step sampling (Q5962027)

From MaRDI portal

Jump to:navigation, search

scientific article; zbMATH DE number 5786411

Language	Label	Description	Also known as
English	On-line policy gradient estimation with multi-step sampling	scientific article; zbMATH DE number 5786411

Statements

scholarly article

0 references

On-line policy gradient estimation with multi-step sampling (English)

0 references

0 references

0 references

0 references

Discrete Event Dynamic Systems

0 references

publication date

16 September 2010

0 references

The authors discuss the problem of sample-path-based (on-line) performance gradient estimation for Markov systems. The existing on-line performance gradient estimation algorithms generally require a standard importance sampling assumption. Examples are given to illustrate that the existing on-line policy gradient approaches cannot provide an accurate gradient estimate when the assumption does not hold. It is shown that this assumption can be relaxed and a few new algorithms are proposed based on multi-step sampling. These algorithms do not require this assumption. All the algorithms can be implemented on sample paths and policy gradients can be estimated on-line.

0 references

Irina V. Konopleva

0 references

zbMATH Keywords

Markov reward processes

0 references

on-line estimation

0 references

performance potentials

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1007/s10626-009-0078-3

0 references

0 references

0 references

A basic formula for online policy gradient algorithms

0 references

Perturbation realization, potentials, and sensitivity analysis of Markov processes

0 references

0 references

0 references

Simulation-based optimization of Markov reward processes

0 references

0 references

Identifiers

zbMATH Open document ID

0 references

Mathematics Subject Classification ID

0 references

0 references

zbMATH DE Number

0 references

0 references

10.1007/S10626-009-0078-3

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:5962027

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q5962027&oldid=38279944"