Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
scientific article; zbMATH DE number 7008326 - MaRDI portal

scientific article; zbMATH DE number 7008326

From MaRDI portal
Publication:4614113

zbMath1477.62192arXiv1710.10345MaRDI QIDQ4614113

Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Daniel Soudry, Nathan Srebro

Publication date: 30 January 2019

Full work available at URL: https://arxiv.org/abs/1710.10345

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.



Related Items

Deep learning: a statistical viewpoint, Spurious Valleys in Two-layer Neural Network Optimization Landscapes, Two-Layer Neural Networks with Values in a Banach Space, Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting, and Regularization, Prevalence of neural collapse during the terminal phase of deep learning training, Theoretical issues in deep networks, The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima, Geometry of Linear Convolutional Networks, Implicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution, Solving Elliptic Problems with Singular Sources Using Singularity Splitting Deep Ritz Method, Unnamed Item, Unnamed Item, Tractability from overparametrization: the example of the negative perceptron, Gradient descent on infinitely wide neural networks: global convergence and generalization, Generalized gradients in dynamic optimization, optimal control, and machine learning problems, Discussion of: ``Nonparametric regression using deep neural networks with ReLU activation function, Rejoinder: ``Nonparametric regression using deep neural networks with ReLU activation function, Accelerating flash calculation through deep learning methods, Generalization Error in Deep Learning, Unnamed Item, Unnamed Item, Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks, Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions, Four heads are better than three, A selective overview of deep learning, Linearized two-layers neural networks in high dimension, Bias of homotopic gradient descent for the hinge loss, Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness, On the Purity and Entropy of Mixed Gaussian States, Unnamed Item, Unnamed Item, Implicit Regularization and Momentum Algorithms in Nonlinearly Parameterized Adaptive Control and Prediction, On the robustness of minimum norm interpolators and regularized empirical risk minimizers, Measurement error models: from nonparametric methods to deep neural networks, Scaling description of generalization with number of parameters in deep learning, Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup*, Unnamed Item, Unnamed Item, AdaBoost and robust one-bit compressed sensing, Stable recovery of entangled weights: towards robust identification of deep neural networks from minimal samples, On the perceptron's compression, An analytic theory of shallow networks dynamics for hinge loss classification*, Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification*, From inexact optimization to learning via gradient concentration, Implicit regularization with strongly convex bias: Stability and acceleration


Uses Software


Cites Work