Actor-Critic--Type Learning Algorithms for Markov Decision Processes
From MaRDI portal
Publication:4943714
DOI10.1137/S036301299731669XzbMath0938.93069OpenAlexW2082261506MaRDI QIDQ4943714
Vijaymohan R. Konda, Vivek S. Borkar
Publication date: 19 March 2000
Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1137/s036301299731669x
Learning and adaptive systems in artificial intelligence (68T05) Time-scale analysis and singular perturbations in control/observation systems (93C70) Stochastic approximation (62L20) Stochastic learning and adaptive control (93E35)
Related Items (31)
Recursive regression estimation based on the two-time-scale stochastic approximation method and Bernstein polynomials ⋮ Learning with Limited Samples: Meta-Learning and Applications to Communication Systems ⋮ Convergence rate of linear two-time-scale stochastic approximation. ⋮ A constrained optimization perspective on actor-critic algorithms and application to network routing ⋮ Multiscale Q-learning with linear function approximation ⋮ Asynchronous stochastic approximation with differential inclusions ⋮ Actor-Critic–Like Stochastic Adaptive Search for Continuous Simulation Optimization ⋮ Actor-critic algorithms for hierarchical Markov decision processes ⋮ Reinforcement learning based algorithms for average cost Markov decision processes ⋮ Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algo\-rithms ⋮ A Small Gain Analysis of Single Timescale Actor Critic ⋮ Risk-Sensitive Reinforcement Learning via Policy Gradient Search ⋮ An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents ⋮ On the sample complexity of actor-critic method for reinforcement learning with function approximation ⋮ Two-time-scale nonparametric recursive regression estimator for independent functional data ⋮ Two-timescale stochastic gradient descent in continuous time with applications to joint online parameter estimation and optimal sensor placement ⋮ New algorithms of the Q-learning type ⋮ Reinforcement learning for long-run average cost. ⋮ Convergent multiple-timescales reinforcement learning algorithms in normal form games ⋮ Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies ⋮ A two-level hierarchical Markov decision model with considering interaction between levels ⋮ The Borkar-Meyn theorem for asynchronous stochastic approximations ⋮ An actor-critic algorithm for constrained Markov decision processes ⋮ Stochastic approximation algorithms: overview and recent trends. ⋮ REINFORCEMENT LEARNING IN MARKOVIAN EVOLUTIONARY GAMES ⋮ A sensitivity formula for risk-sensitive cost and the actor-critic algorithm ⋮ Empirical Dynamic Programming ⋮ Natural actor-critic algorithms ⋮ Dynamic pricing models for electronic business ⋮ A reinforcement learning algorithm for rescheduling preempted tasks in fog nodes ⋮ Empirical Q-Value Iteration
This page was built for publication: Actor-Critic--Type Learning Algorithms for Markov Decision Processes