UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

From MaRDI portal
Publication:6366911

arXiv2105.02135MaRDI QIDQ6366911

Author name not available (Why is that?)

Publication date: 5 May 2021

Abstract: Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). Yet even a precise knowledge of the value function Vpi corresponding to a policy pi does not provide reliable information on how far is the policy pi from the optimal one. We present a novel model-free upper value iteration procedure (sfUVIP) that allows us to estimate the suboptimality gap Vstar(x)Vpi(x) from above and to construct confidence intervals for Vstar. Our approach relies on upper bounds to the solution of the Bellman optimality equation via martingale approach. We provide theoretical guarantees for sfUVIP under general assumptions and illustrate its performance on a number of benchmark RL problems.




Has companion code repository: https://github.com/human0being/uvip-rl








This page was built for publication: UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6366911)