Random Function Descent

From MaRDI portal
Publication:6435032

arXiv2305.01377MaRDI QIDQ6435032

Author name not available (Why is that?)

Publication date: 2 May 2023

Abstract: While gradient based methods are ubiquitous in machine learning, selecting the right step size often requires "hyperparameter tuning". This is because backtracking procedures like Armijo's rule depend on quality evaluations in every step, which are not available in a stochastic context. Since optimization schemes can be motivated using Taylor approximations, we replace the Taylor approximation with the conditional expectation (the best L2 estimator) and propose "Random Function Descent" (RFD). Under light assumptions common in Bayesian optimization, we prove that RFD is identical to gradient descent, but with calculable step sizes, even in a stochastic context. We beat untuned Adam in synthetic benchmarks. To close the performance gap to tuned Adam, we propose a heuristic extension competitive with tuned Adam.




Has companion code repository: https://github.com/FelixBenning/pyrfd








This page was built for publication: Random Function Descent

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6435032)