An Incremental Fast Policy Search Using a Single Sample Path
From MaRDI portal
Publication:5045345
DOI10.1007/978-3-319-69900-4_1zbMath1498.68241OpenAlexW2765274790MaRDI QIDQ5045345
Ajin George Joseph, Shalabh Bhatnagar
Publication date: 4 November 2022
Published in: Lecture Notes in Computer Science (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/978-3-319-69900-4_1
Learning and adaptive systems in artificial intelligence (68T05) Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Markov and semi-Markov decision processes (90C40)
Cites Work
- Unnamed Item
- Unnamed Item
- The cross-entropy method for continuous multi-extremal optimization
- The cross-entropy method for combinatorial and continuous optimization
- Basis function adaptation in temporal difference reinforcement learning
- An iterative method of solving a game
- The Complexity of Markov Decision Processes
- Cross-entropy and rare events for maximal cut and partition problems