Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Overparameterization of deep ResNet: zero loss and mean-field analysis - MaRDI portal

Overparameterization of deep ResNet: zero loss and mean-field analysis

From MaRDI portal
Publication:6368900

arXiv2105.14417MaRDI QIDQ6368900

Zhiyan Ding, Shi Chen, Stephen J. Wright, Qin Li

Publication date: 29 May 2021

Abstract: Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with perfect fit (zero-loss) in many practical situations. We examine this phenomenon for the case of Residual Neural Networks (ResNet) with smooth activation functions in a limiting regime in which both the number of layers (depth) and the number of weights in each layer (width) go to infinity. First, we use a mean-field-limit argument to prove that the gradient descent for parameter training becomes a gradient flow for a probability distribution that is characterized by a partial differential equation (PDE) in the large-NN limit. Next, we show that under certain assumptions, the solution to the PDE converges in the training time to a zero-loss solution. Together, these results suggest that the training of the ResNet gives a near-zero loss if the ResNet is large enough. We give estimates of the depth and width needed to reduce the loss below a given threshold, with high probability.












This page was built for publication: Overparameterization of deep ResNet: zero loss and mean-field analysis

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6368900)