Explicit explore, exploit, or escape \((E^4)\): near-optimal safety-constrained reinforcement learning in polynomial time
From MaRDI portal
Publication:6106432
DOI10.1007/s10994-022-06201-zarXiv2111.07395MaRDI QIDQ6106432
Nicholas Bishop, David M. Bossens
Publication date: 27 June 2023
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2111.07395
model-based reinforcement learningconstrained Markov decision processesrobust Markov decision processessafe explorationsafe artificial intelligence
Cites Work
- A new polynomial-time algorithm for linear programming
- Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program
- Interior-point methods
- Near-optimal reinforcement learning in polynomial time
- \({\mathcal Q}\)-learning
- A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion
- 10.1162/153244303765208377
- Robust Markov Decision Processes
- Robust Control of Markov Decision Processes with Uncertain Transition Matrices
- Probability Inequalities for Sums of Bounded Random Variables
- Robust Dynamic Programming
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
This page was built for publication: Explicit explore, exploit, or escape \((E^4)\): near-optimal safety-constrained reinforcement learning in polynomial time