Improved regret for zeroth-order adversarial bandit convex optimisation (Q2035748)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Improved regret for zeroth-order adversarial bandit convex optimisation
scientific article

    Statements

    Improved regret for zeroth-order adversarial bandit convex optimisation (English)
    0 references
    0 references
    25 June 2021
    0 references
    Summary: We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most \(O(d^{2.5} \sqrt{n} \log(n))\), where \(d\) is the dimension and \(n\) is the number of interactions. This improves on the bound of \(O(d^{9.5} \sqrt{n} \log(n)^{7.5})\) by \textit{S. Bubeck} et al. [in: Proceedings of the 49th annual ACM SIGACT symposium on theory of computing, STOC 2017. New York, NY: Association for Computing Machinery (ACM). 72--85 (2017; Zbl 1370.90175)]. The proof is based on identifying an improved exploratory distribution for convex functions.
    0 references
    bandit convex optimisation
    0 references
    online learning
    0 references

    Identifiers