Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks

From MaRDI portal
Publication:6401163

arXiv2206.02139MaRDI QIDQ6401163

Author name not available (Why is that?)

Publication date: 5 June 2022

Abstract: The convergence of GD and SGD when training mildly parameterized neural networks starting from random initialization is studied. For a broad range of models and loss functions, including the most commonly used square loss and cross entropy loss, we prove an ``early stage convergence result. We show that the loss is decreased by a significant amount in the early stage of the training, and this decrease is fast. Furthurmore, for exponential type loss functions, and under some assumptions on the training data, we show global convergence of GD. Instead of relying on extreme over-parameterization, our study is based on a microscopic analysis of the activation patterns for the neurons, which helps us derive more powerful lower bounds for the gradient. The results on activation patterns, which we call ``neuron partition, help build intuitions for understanding the behavior of neural networks' training dynamics, and may be of independent interest.




Has companion code repository: https://github.com/wmz9/early_stage_convergence_neurips2022








This page was built for publication: Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6401163)