Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Actor-Critic--Type Learning Algorithms for Markov Decision Processes - MaRDI portal

Actor-Critic--Type Learning Algorithms for Markov Decision Processes

From MaRDI portal
Publication:4943714

DOI10.1137/S036301299731669XzbMath0938.93069OpenAlexW2082261506MaRDI QIDQ4943714

Vijaymohan R. Konda, Vivek S. Borkar

Publication date: 19 March 2000

Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1137/s036301299731669x




Related Items (31)

Recursive regression estimation based on the two-time-scale stochastic approximation method and Bernstein polynomialsLearning with Limited Samples: Meta-Learning and Applications to Communication SystemsConvergence rate of linear two-time-scale stochastic approximation.A constrained optimization perspective on actor-critic algorithms and application to network routingMultiscale Q-learning with linear function approximationAsynchronous stochastic approximation with differential inclusionsActor-Critic–Like Stochastic Adaptive Search for Continuous Simulation OptimizationActor-critic algorithms for hierarchical Markov decision processesReinforcement learning based algorithms for average cost Markov decision processesConvergence rate and averaging of nonlinear two-time-scale stochastic approximation algo\-rithmsA Small Gain Analysis of Single Timescale Actor CriticRisk-Sensitive Reinforcement Learning via Policy Gradient SearchAn actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agentsOn the sample complexity of actor-critic method for reinforcement learning with function approximationTwo-time-scale nonparametric recursive regression estimator for independent functional dataTwo-timescale stochastic gradient descent in continuous time with applications to joint online parameter estimation and optimal sensor placementNew algorithms of the Q-learning typeReinforcement learning for long-run average cost.Convergent multiple-timescales reinforcement learning algorithms in normal form gamesGlobal Convergence of Policy Gradient Methods to (Almost) Locally Optimal PoliciesA two-level hierarchical Markov decision model with considering interaction between levelsThe Borkar-Meyn theorem for asynchronous stochastic approximationsAn actor-critic algorithm for constrained Markov decision processesStochastic approximation algorithms: overview and recent trends.REINFORCEMENT LEARNING IN MARKOVIAN EVOLUTIONARY GAMESA sensitivity formula for risk-sensitive cost and the actor-critic algorithmEmpirical Dynamic ProgrammingNatural actor-critic algorithmsDynamic pricing models for electronic businessA reinforcement learning algorithm for rescheduling preempted tasks in fog nodesEmpirical Q-Value Iteration




This page was built for publication: Actor-Critic--Type Learning Algorithms for Markov Decision Processes