Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Actor-Critic--Type Learning Algorithms for Markov Decision Processes - MaRDI portal

Actor-Critic--Type Learning Algorithms for Markov Decision Processes

From MaRDI portal

Publication:4943714

Jump to:navigation, search

DOI10.1137/S036301299731669XzbMath0938.93069OpenAlexW2082261506MaRDI QIDQ4943714

Vijaymohan R. Konda, Vivek S. Borkar

Publication date: 19 March 2000

Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1137/s036301299731669x

zbMATH Keywords

Markov decision processes stochastic approximation reinforcement learning simulated transitions

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Time-scale analysis and singular perturbations in control/observation systems (93C70) Stochastic approximation (62L20) Stochastic learning and adaptive control (93E35)

Related Items (31)

Recursive regression estimation based on the two-time-scale stochastic approximation method and Bernstein polynomials ⋮ Learning with Limited Samples: Meta-Learning and Applications to Communication Systems ⋮ Convergence rate of linear two-time-scale stochastic approximation. ⋮ A constrained optimization perspective on actor-critic algorithms and application to network routing ⋮ Multiscale Q-learning with linear function approximation ⋮ Asynchronous stochastic approximation with differential inclusions ⋮ Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization ⋮ Actor-critic algorithms for hierarchical Markov decision processes ⋮ Reinforcement learning based algorithms for average cost Markov decision processes ⋮ Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algo\-rithms ⋮ A Small Gain Analysis of Single Timescale Actor Critic ⋮ Risk-Sensitive Reinforcement Learning via Policy Gradient Search ⋮ An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents ⋮ On the sample complexity of actor-critic method for reinforcement learning with function approximation ⋮ Two-time-scale nonparametric recursive regression estimator for independent functional data ⋮ Two-timescale stochastic gradient descent in continuous time with applications to joint online parameter estimation and optimal sensor placement ⋮ New algorithms of the Q-learning type ⋮ Reinforcement learning for long-run average cost. ⋮ Convergent multiple-timescales reinforcement learning algorithms in normal form games ⋮ Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies ⋮ A two-level hierarchical Markov decision model with considering interaction between levels ⋮ The Borkar-Meyn theorem for asynchronous stochastic approximations ⋮ An actor-critic algorithm for constrained Markov decision processes ⋮ Stochastic approximation algorithms: overview and recent trends. ⋮ REINFORCEMENT LEARNING IN MARKOVIAN EVOLUTIONARY GAMES ⋮ A sensitivity formula for risk-sensitive cost and the actor-critic algorithm ⋮ Empirical Dynamic Programming ⋮ Natural actor-critic algorithms ⋮ Dynamic pricing models for electronic business ⋮ A reinforcement learning algorithm for rescheduling preempted tasks in fog nodes ⋮ Empirical Q-Value Iteration

This page was built for publication: Actor-Critic--Type Learning Algorithms for Markov Decision Processes

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4943714&oldid=19361648"