Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
An analysis of temporal-difference learning with function approximation - MaRDI portal

An analysis of temporal-difference learning with function approximation

From MaRDI portal

Publication:4362297

Jump to:navigation, search

DOI10.1109/9.580874zbMath0914.93075OpenAlexW2139418546MaRDI QIDQ4362297

Benjamin van Roy, John N. Tsitsiklis

Publication date: 6 May 1999

Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1109/9.580874

zbMATH Keywords

sampling approximations irreducible aperiodic Markov chain

Mathematics Subject Classification ID

Markov chains (discrete-time Markov processes on discrete state spaces) (60J10) Stochastic learning and adaptive control (93E35)

Related Items

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage, Accelerated and Instance-Optimal Policy Evaluation with Linear Function Approximation, Approximate policy iteration: a survey and some new methods, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications, An incremental off-policy search in a model-free Markov decision process using a single sample path, Adaptive importance sampling for control and inference, An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method, Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm, Multiscale Q-learning with linear function approximation, Unnamed Item, Perspectives of approximate dynamic programming, Reinforcement learning based algorithms for average cost Markov decision processes, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning, Hybrid MDP based integrated hierarchical Q-learning, Simple and Optimal Methods for Stochastic Variational Inequalities, II: Markovian Noise and Policy Evaluation in Reinforcement Learning, From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming, Stochastic recursive inclusions with non-additive iterate-dependent Markov noise, An approximate dynamic programming approach to the admission control of elective patients, Neural circuits for learning context-dependent associations of stimuli, Rationality and intelligence, Energy contracts management by stochastic programming techniques, An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes, A Small Gain Analysis of Single Timescale Actor Critic, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Variance-constrained actor-critic algorithms for discounted and average reward MDPs, Finite-time convergence rates of distributed local stochastic approximation, Convergence of stochastic approximation via martingale and converse Lyapunov methods, A reinforcement learning adaptive fuzzy controller for robots., A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system, Unnamed Item, Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality, Uncovering instabilities in variational-quantum deep Q-networks, Gradient temporal-difference learning for off-policy evaluation using emphatic weightings, Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning, From Reinforcement Learning to Deep Reinforcement Learning: An Overview, Premium control with reinforcement learning, Robust reinforcement learning control with static and dynamic stability, Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning, Reinforcement learning algorithms with function approximation: recent advances and applications, Asymptotic analysis of value prediction by well-specified and misspecified models, Toward Nonlinear Local Reinforcement Learning Rules Through Neuroevolution, A Q-learning predictive control scheme with guaranteed stability, Approximate dynamic programming for link scheduling in wireless mesh networks, Temporal difference-based policy iteration for optimal control of stochastic systems, A \(Sarsa(\lambda)\) algorithm based on double-layer fuzzy reasoning, On Generalized Bellman Equations and Temporal-Difference Learning, Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, A review on deep reinforcement learning for fluid mechanics, Quadratic approximate dynamic programming for input‐affine systems, Stochastic approximation, Basis function adaptation in temporal difference reinforcement learning, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, The Borkar-Meyn theorem for asynchronous stochastic approximations, On-policy concurrent reinforcement learning, Parallel dynamic water supply scheduling in a cluster of computers, Real-time reinforcement learning by sequential actor-critics and experience replay, An actor-critic algorithm for constrained Markov decision processes, Proximal algorithms and temporal difference methods for solving fixed point problems, Stochastic approximation algorithms: overview and recent trends., Adaptive critic design with graph Laplacian for online learning control of nonlinear systems, Relational Sequence Learning, Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control, Continuous-Time Robust Dynamic Programming, On the Asymptotic Equivalence Between Differential Hebbian and Temporal Difference Learning, Off-policy temporal difference learning with distribution adaptation in fast mixing chains, A formal framework and extensions for function approximation in learning classifier systems, Projected equation methods for approximate solution of large linear systems, Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling, Bayesian Exploration for Approximate Dynamic Programming, Variance Regularization in Sequential Bayesian Optimization, Transmission scheduling for multi-process multi-sensor remote estimation via approximate dynamic programming, Reinforcement distribution in fuzzy Q-learning, Deep reinforcement learning for inventory control: a roadmap, Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation, Natural actor-critic algorithms, A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation, On the existence of fixed points for approximate value iteration and temporal-difference learning, The single-node dynamic service scheduling and dispatching problem, Fundamental design principles for reinforcement learning algorithms, FLOW SHOP SCHEDULING WITH REINFORCEMENT LEARNING, Concentration of Contractive Stochastic Approximation and Reinforcement Learning, Actor-Critic Algorithms with Online Feature Adaptation, High-order fully actuated system approaches: Part VIII. Optimal control with application in spacecraft attitude stabilisation

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4362297&oldid=18347870"