Mathematical Research Data Initiative
Main page
Recent changes
Random page
Help about MediaWiki
Create a new Item
Create a new Property
Create a new EntitySchema
Merge two items
In other projects
Discussion
View source
View history
Purge
English
Log in

Reinforcement learning with replacing eligibility traces

From MaRDI portal
Publication:1911343
Jump to:navigation, search

DOI10.1007/BF00114726zbMath0843.68094MaRDI QIDQ1911343

Richard S. Sutton, Satinder Pal Singh

Publication date: 13 August 1996

Published in: Machine Learning (Search for Journal in Brave)


zbMATH Keywords

Monte Carlo methodsreinforcement learningtemporal difference learningeligibility tracereplacing trace


Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)


Related Items

Guiding exploration by pre-existing knowledge without modifying reward, The optimal unbiased value estimator and its relation to LSTD, TD and MC, Risk-averse policy optimization via risk-neutral policy optimization, A Gentle Introduction to Reinforcement Learning



Cites Work

  • Asynchronous stochastic approximation and Q-learning
  • Practical issues in temporal difference learning
  • The convergence of \(TD(\lambda)\) for general \(\lambda\)
  • Temporal-difference methods and Markov models
  • On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
  • A Note on the Inversion of Matrices by Random Walks
  • Unnamed Item
  • Unnamed Item
  • Unnamed Item
  • Unnamed Item
Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1911343&oldid=14330006"
Tools
What links here
Related changes
Special pages
Printable version
Permanent link
Page information
MaRDI portal item
This page was last edited on 1 February 2024, at 15:25.
Privacy policy
About MaRDI portal
Disclaimers
Imprint
Powered by MediaWiki