Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
On-line policy gradient estimation with multi-step sampling - MaRDI portal

On-line policy gradient estimation with multi-step sampling (Q5962027)

From MaRDI portal
scientific article; zbMATH DE number 5786411
Language Label Description Also known as
English
On-line policy gradient estimation with multi-step sampling
scientific article; zbMATH DE number 5786411

    Statements

    On-line policy gradient estimation with multi-step sampling (English)
    0 references
    0 references
    0 references
    0 references
    16 September 2010
    0 references
    The authors discuss the problem of sample-path-based (on-line) performance gradient estimation for Markov systems. The existing on-line performance gradient estimation algorithms generally require a standard importance sampling assumption. Examples are given to illustrate that the existing on-line policy gradient approaches cannot provide an accurate gradient estimate when the assumption does not hold. It is shown that this assumption can be relaxed and a few new algorithms are proposed based on multi-step sampling. These algorithms do not require this assumption. All the algorithms can be implemented on sample paths and policy gradients can be estimated on-line.
    0 references
    Markov reward processes
    0 references
    on-line estimation
    0 references
    performance potentials
    0 references

    Identifiers