A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning
From MaRDI portal
Publication:6121659
DOI10.1016/J.INS.2023.119011MaRDI QIDQ6121659
Xinyang Deng, Yixin He, Wen Jiang, Fanghui Huang
Publication date: 26 March 2024
Published in: Information Sciences (Search for Journal in Brave)
reinforcement learningaction confidence limitdeep auto-encoder networkexploration policyuncertainty of action
Nonparametric tolerance and confidence regions (62G15) Learning and adaptive systems in artificial intelligence (68T05) Source coding (94A29)
Cites Work
- Unnamed Item
- \({\mathcal Q}\)-learning
- Exploration of multi-state environments: Local measures and back-propagation of uncertainty
- An online-learning-based evolutionary many-objective algorithm
- AnD: a many-objective evolutionary algorithm with angle-based selection and shift-based density estimation
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- An adaptive polyploid memetic algorithm for scheduling trucks at a cross-docking terminal
This page was built for publication: A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning