Adaptive policy-iteration and policy-value-iteration for discounted Markov decision processes

From MaRDI portal
Publication:3984139