Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme - MaRDI portal

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme (Q5153609)

From MaRDI portal

Jump to:navigation, search

scientific article; zbMATH DE number 7404789

Language	Label	Description	Also known as
English	Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme	scientific article; zbMATH DE number 7404789

Statements

scholarly article

0 references

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme (English)

0 references

Vivek S. Borkar

0 references

Hars P. Dolhare

0 references

0 references

Konstantin E. Avrachenkov

0 references

Modern Trends in Controlled Stochastic Processes:

0 references

publication date

30 September 2021

0 references

full work available at URL

https://arxiv.org/abs/2103.05981

0 references

zbMATH Keywords

Markov decision process (MDP)

0 references

approximate dynamic programming

0 references

deep reinforcement learning (DRL)

0 references

stochastic approximation

0 references

deep Q-network (DQN)

0 references

full gradient DQN

0 references

Bellman error minimization

0 references

MaRDI profile type

MaRDI publication profile

0 references

An Introduction to Deep Reinforcement Learning

0 references

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

0 references

Simulation-based optimization of Markov reward processes

0 references

Actor-Critic Algorithms with Online Feature Adaptation

0 references

0 references

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

0 references

0 references

Asynchronous stochastic approximation and Q-learning

0 references

An analysis of temporal-difference learning with function approximation

0 references

\({\mathcal Q}\)-learning

0 references

Stochastic Recursive Inclusions in Two Timescales with Nonadditive Iterate-Dependent Markov Noise

0 references

Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations

0 references

0 references

0 references

0 references

The Theory of Max-Min, with Applications

0 references

Convergence of a stochastic approximation version of the EM algorithm

0 references

Identifiers

zbMATH Open document ID

0 references

10.1007/978-3-030-76928-4_10

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

zbMATH DE Number

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:5153609

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q5153609&oldid=36719135"