Gradient Descent with Identity Initialization Efficiently Learns Positive-Definite Linear Transformations by Deep Residual Networks
From MaRDI portal
Publication:5154121
DOI10.1162/neco_a_01164zbMath1475.68311arXiv1802.06093OpenAlexW2911153392WikidataQ91053158 ScholiaQ91053158MaRDI QIDQ5154121
Bartlett, Peter L., Philip M. Long, David P. Helmbold
Publication date: 1 October 2021
Published in: Neural Computation (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1802.06093
Related Items (7)
Effects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networks ⋮ Loss landscapes and optimization in over-parameterized non-linear systems and neural networks ⋮ A survey on deep matrix factorizations ⋮ Gradient descent optimizes over-parameterized deep ReLU networks ⋮ Gradient descent for deep matrix factorization: dynamics and implicit bias towards low rank ⋮ Unnamed Item ⋮ Every Local Minimum Value Is the Global Minimum Value of Induced Model in Nonconvex Machine Learning
Cites Work
This page was built for publication: Gradient Descent with Identity Initialization Efficiently Learns Positive-Definite Linear Transformations by Deep Residual Networks