Empirically explaining SGD from a line search perspective

From MaRDI portal
Publication:6364248

arXiv2103.17132MaRDI QIDQ6364248

Author name not available (Why is that?)

Publication date: 31 March 2021

Abstract: Optimization in Deep Learning is mainly guided by vague intuitions and strong assumptions, with a limited understanding how and why these work in practice. To shed more light on this, our work provides some deeper understandings of how SGD behaves by empirically analyzing the trajectory taken by SGD from a line search perspective. Specifically, a costly quantitative analysis of the full-batch loss along SGD trajectories from common used models trained on a subset of CIFAR-10 is performed. Our core results include that the full-batch loss along lines in update step direction is highly parabolically. Further on, we show that there exists a learning rate with which SGD always performs almost exact line searches on the full-batch loss. Finally, we provide a different perspective why increasing the batch size has almost the same effect as decreasing the learning rate by the same factor.




Has companion code repository: https://github.com/cogsys-tuebingen/empirically_explaining_sgd_from_a_line_search_perspective








This page was built for publication: Empirically explaining SGD from a line search perspective

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6364248)