Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
scientific article; zbMATH DE number 5037124 - MaRDI portal

scientific article; zbMATH DE number 5037124

From MaRDI portal

Publication:5477863

Jump to:navigation, search

DOI10.1023/A:1018064306595zbMath1099.68692MaRDI QIDQ5477863

Sridhar Mahadevan

Publication date: 29 June 2006

Published in: Machine Learning (Search for Journal in Brave)

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

Markov decision processes Reinforcement learning

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Stochastic learning and adaptive control (93E35) Markov and semi-Markov decision processes (90C40)

Related Items

Hybrid MDP based integrated hierarchical Q-learning ⋮ Job control in heterogeneous computing systems ⋮ Model-based average reward reinforcement learning ⋮ Reinforcement learning for joint pricing, lead-time and scheduling decisions in make-to-order systems ⋮ \(R(\lambda)\) imitation learning for automatic generation control of interconnected power grids ⋮ Multi-agent natural actor-critic reinforcement learning algorithms ⋮ Optimal Curiosity-Driven Modular Incremental Slow Feature Analysis ⋮ Reinforcement learning for long-run average cost. ⋮ SOLVING DYNAMIC WILDLIFE RESOURCE OPTIMIZATION PROBLEMS USING REINFORCEMENT LEARNING ⋮ Analyzing anonymity attacks through noisy channels ⋮ Unnamed Item ⋮ Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning ⋮ Long-Term Reward Prediction in TD Models of the Dopamine System ⋮ A construction algorithm for designing guide paths of automated guided vehicle systems ⋮ Importance sampling in reinforcement learning with an estimated behavior policy ⋮ A Neurocomputational Model for Cocaine Addiction ⋮ Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health ⋮ Batch policy learning in average reward Markov decision processes

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5477863&oldid=30026805"