Improved regret for zeroth-order adversarial bandit convex optimisation (Q2035748)
From MaRDI portal
scientific article
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Improved regret for zeroth-order adversarial bandit convex optimisation |
scientific article |
Statements
Improved regret for zeroth-order adversarial bandit convex optimisation (English)
0 references
25 June 2021
0 references
Summary: We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most \(O(d^{2.5} \sqrt{n} \log(n))\), where \(d\) is the dimension and \(n\) is the number of interactions. This improves on the bound of \(O(d^{9.5} \sqrt{n} \log(n)^{7.5})\) by \textit{S. Bubeck} et al. [in: Proceedings of the 49th annual ACM SIGACT symposium on theory of computing, STOC 2017. New York, NY: Association for Computing Machinery (ACM). 72--85 (2017; Zbl 1370.90175)]. The proof is based on identifying an improved exploratory distribution for convex functions.
0 references
bandit convex optimisation
0 references
online learning
0 references