Lenient Regret and Good-Action Identification in Gaussian Process Bandits

From MaRDI portal
Publication:6360356

arXiv2102.05793MaRDI QIDQ6360356

Author name not available (Why is that?)

Publication date: 10 February 2021

Abstract: In this paper, we study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is "good enough". On the theoretical side, we study various {em lenient regret} notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circumventing the usual O(sqrtT) term (with time horizon T) resulting from zooming extremely close towards the function maximum. In addition, we complement these upper bounds with algorithm-independent lower bounds. On the practical side, we consider the problem of finding a single "good action" according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold. We experimentally find that such algorithms can often find a good action faster than standard optimization-based approaches.




Has companion code repository: https://github.com/caitree/GoodAction








This page was built for publication: Lenient Regret and Good-Action Identification in Gaussian Process Bandits

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6360356)