Effects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networks
From MaRDI portal
Publication:5037872
DOI10.1142/S0219530521500263zbMath1487.68201arXiv1910.05874MaRDI QIDQ5037872
Publication date: 4 March 2022
Published in: Analysis and Applications (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1910.05874
Artificial neural networks and deep learning (68T07) Numerical mathematical programming methods (65K05) Applications of mathematical programming (90C90) Nonconvex programming, global optimization (90C26) Numerical linear algebra (65F99)
Related Items
Uses Software
Cites Work
- Unnamed Item
- A coordinate gradient descent method for nonsmooth separable minimization
- Randomized Kaczmarz solver for noisy linear systems
- Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization
- A randomized Kaczmarz algorithm with exponential convergence
- Gradient descent optimizes over-parameterized deep ReLU networks
- Theory of deep convolutional neural networks: downsampling
- Universality of deep convolutional neural networks
- Randomized Extended Kaczmarz for Solving Least Squares
- Reducing the Dimensionality of Data with Neural Networks
- Randomized Methods for Linear Constraints: Convergence Rates and Conditioning
- Gradient Descent with Identity Initialization Efficiently Learns Positive-Definite Linear Transformations by Deep Residual Networks