Optimization Methods for Large-Scale Machine Learning

From MaRDI portal

Publication:4641709

Jump to:navigation, search

DOI10.1137/16M1080173zbMath1397.65085arXiv1606.04838OpenAlexW2963433607WikidataQ89144557 ScholiaQ89144557MaRDI QIDQ4641709

Léon Bottou, Nocedal, Jorge, Frank E. Curtis

Publication date: 18 May 2018

Published in: SIAM Review (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1606.04838

zbMATH Keywords

machine learning numerical optimization second-order methods stochastic gradient methods noise reduction methods algorithm complexity analysis

Mathematics Subject Classification ID

Analysis of algorithms and problem complexity (68Q25) Numerical mathematical programming methods (65K05) Large-scale problems in mathematical programming (90C06) Nonlinear programming (90C30) Learning and adaptive systems in artificial intelligence (68T05) Optimization of shapes other than minimal surfaces (49Q10)

Related Items

Stochastic algorithms for self-consistent calculations of electronic structures, Spurious Valleys in Two-layer Neural Network Optimization Landscapes, A Stochastic Levenberg--Marquardt Method Using Random Models with Complexity Results, Continuous-Time Convergence Rates in Potential and Monotone Games, A stochastic gradient descent approach with partitioned-truncated singular value decomposition for large-scale inverse problems of magnetic modulus data, Imaging conductivity from current density magnitude using neural networks*, Stochastic gradient descent for linear inverse problems in Hilbert spaces, Minibatch Forward-Backward-Forward Methods for Solving Stochastic Variational Inequalities, Multiple-sets split quasi-convex feasibility problems: Adaptive subgradient methods with convergence guarantee, Sublinear Convergence of a Tamed Stochastic Gradient Descent Method in Hilbert Space, Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent, slimTrain---A Stochastic Approximation Method for Training Separable Deep Neural Networks, Lower bounds for non-convex stochastic optimization, An adaptive stochastic sequential quadratic programming with differentiable exact augmented Lagrangians, Stochastic gradient descent with noise of machine learning type. I: Discrete time analysis, ALMOND: Adaptive Latent Modeling and Optimization via Neural Networks and Langevin Diffusion, A stochastic gradient method for a class of nonlinear PDE-constrained optimal control problems under uncertainty, Convergence analysis of a subsampled Levenberg-Marquardt algorithm, Gauss-Newton method for solving linear inverse problems with neural network coders, A nonlinear conjugate gradient method using inexact first-order information, SCORE: approximating curvature information under self-concordant regularization, A dual-based stochastic inexact algorithm for a class of stochastic nonsmooth convex composite problems, Inequality constrained stochastic nonlinear optimization via active-set sequential quadratic programming, A trust region method for noisy unconstrained optimization, Principled deep neural network training through linear programming, A distributed optimisation framework combining natural gradient with Hessian-free for discriminative sequence training, Epistemic uncertainty quantification in deep learning classification by the delta method, Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization, Asynchronous fully-decentralized SGD in the cluster-based model, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Stochastic momentum methods for non-convex learning without bounded assumptions, Improved variance reduction extragradient method with line search for stochastic variational inequalities, Three ways to solve partial differential equations with neural networks — A review, An introduction to deep generative modeling, An adaptive sampling augmented Lagrangian method for stochastic optimization with deterministic constraints, Subgradient Sampling for Nonsmooth Nonconvex Minimization, Parameter estimation in a 3‐parameter p‐star random graph model, Finite-time convergence rates of distributed local stochastic approximation, Structured learning of rigid‐body dynamics: A survey and unified view from a robotics perspective, Adaptive stochastic gradient descent for optimal control of parabolic equations with random parameters, A new stochastic gradient descent possibilistic clustering algorithm, Solving Elliptic Problems with Singular Sources Using Singularity Splitting Deep Ritz Method, Time discretization in the solution of parabolic PDEs with ANNs, Scaling up stochastic gradient descent for non-convex optimisation, An overview of stochastic quasi-Newton methods for large-scale machine learning, A mini-batch proximal stochastic recursive gradient algorithm with diagonal Barzilai-Borwein stepsize, A framework of convergence analysis of mini-batch stochastic projected gradient methods, Derivation of coordinate descent algorithms from optimal control theory, A New Certified Hierarchical and Adaptive RB-ML-ROM Surrogate Model for Parametrized PDEs, Neural network-based limiter with transfer learning, Stochastic mirror descent method for linear ill-posed problems in Banach spaces, On mathematical optimization for clustering categories in contingency tables, Globally Convergent Multilevel Training of Deep Residual Networks, Convergence rates of the stochastic alternating algorithm for bi-objective optimization, On the Generalized Langevin Equation for Simulated Annealing, A gradient-based reinforcement learning model of market equilibration, On the Convergence of Stochastic Gradient Descent for Nonlinear Ill-Posed Problems, First-Order Methods for Nonconvex Quadratic Minimization, Trust-region algorithms for training responses: machine learning methods using indefinite Hessian approximations, Open Problem—Adaptive Constant-Step Stochastic Approximation, MultiComposite Nonconvex Optimization for Training Deep Neural Networks, A Class of Approximate Inverse Preconditioners Based on Krylov-Subspace Methods for Large-Scale Nonconvex Optimization, Sample Complexity of Sample Average Approximation for Conditional Stochastic Optimization, On the discrepancy principle for stochastic gradient descent, An investigation of Newton-Sketch and subsampled Newton methods, Fast Approximation of the Gauss--Newton Hessian Matrix for the Multilayer Perceptron, A globally convergent gradient-like method based on the Armijo line search, A homotopy training algorithm for fully connected neural networks, A Distributed Optimal Control Problem with Averaged Stochastic Gradient Descent, Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection, Stochastic sub-sampled Newton method with variance reduction, Learning the tangent space of dynamical instabilities from data, Sparsity and level set regularization for near-field electromagnetic imaging in 3D, Solving inverse problems using data-driven models, Derivative-free optimization methods, Data assimilation: The Schrödinger perspective, Gradient Descent Finds the Cubic-Regularized Nonconvex Newton Step, An Inertial Newton Algorithm for Deep Learning, Deep Learning: An Introduction for Applied Mathematicians, Adaptive Regularization Algorithms with Inexact Evaluations for Nonconvex Optimization, A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Adaptive Sequential Sample Average Approximation for Solving Two-Stage Stochastic Linear Programs, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Stochastic proximal linear method for structured non-convex problems, Splitting proximal with penalization schemes for additive convex hierarchical minimization problems, Modes of Homogeneous Gradient Flows, An elastic net penalized small area model combining unit- and area-level data for regional hypertension prevalence estimation, Linear convergence of proximal incremental aggregated gradient method for nonconvex nonsmooth minimization problems, Classification, inference and segmentation of anomalous diffusion with recurrent neural networks, LSOS: Line-search second-order stochastic optimization methods for nonconvex finite sums, Stochastic analysis of an adaptive cubic regularization method under inexact gradient evaluations and dynamic Hessian accuracy, On a multilevel Levenberg–Marquardt method for the training of artificial neural networks and its application to the solution of partial differential equations, A fully stochastic second-order trust region method, Stochastic asymptotical regularization for linear inverse problems, Global optimization issues in deep network regression: an overview, Quasi-Newton methods for machine learning: forget the past, just sample, Solving Stochastic Optimization with Expectation Constraints Efficiently by a Stochastic Augmented Lagrangian-Type Algorithm, QNG: A Quasi-Natural Gradient Method for Large-Scale Statistical Learning, Quantile-Based Iterative Methods for Corrupted Systems of Linear Equations, Streaming constrained binary logistic regression with online standardized data, ASTRO-DF: A Class of Adaptive Sampling Trust-Region Algorithms for Derivative-Free Stochastic Optimization, Adaptive Sampling Strategies for Stochastic Optimization, A Deep Learning Method for Elliptic Hemivariational Inequalities, Unnamed Item, Unnamed Item, Unnamed Item, Stochastic quasi-Newton with line-search regularisation, Optimal randomized classification trees, Efficient and sparse neural networks by pruning weights in a multiobjective learning approach, A framework for randomized time-splitting in linear-quadratic optimal control, The exact worst-case convergence rate of the gradient method with fixed step lengths for \(L\)-smooth functions, A semismooth Newton method for support vector classification and regression, Stochastic proximal subgradient descent oscillates in the vicinity of its accumulation set, DAS-PINNs: a deep adaptive sampling method for solving high-dimensional partial differential equations, Generalized forward-backward splitting with penalization for monotone inclusion problems, Unnamed Item, Exploiting negative curvature in deterministic and stochastic optimization, Distributed nonconvex constrained optimization over time-varying digraphs, An accelerated variance reducing stochastic method with Douglas-Rachford splitting, On Sampling Rates in Simulation-Based Recursions, Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions, A limited-memory trust-region method for nonlinear optimization with many equality constraints, Stable architectures for deep neural networks, An abstract convergence framework with application to inertial inexact forward-backward methods, Inexact gradient projection method with relative error tolerance, Distributed stochastic gradient tracking methods with momentum acceleration for non-convex optimization, On the asymptotic rate of convergence of stochastic Newton algorithms and their weighted averaged versions, Halting time is predictable for large models: a universality property and average-case analysis, A gradient descent method for solving a system of nonlinear equations, Generalized gradients in dynamic optimization, optimal control, and machine learning problems, Variance-Based Extragradient Methods with Line Search for Stochastic Variational Inequalities, Optimization for deep learning: an overview, A review on deep learning in medical image reconstruction, How can machine learning and optimization help each other better?, Bi-fidelity stochastic gradient descent for structural optimization under uncertainty, Recovering missing CFD data for high-order discretizations using deep neural networks and dynamics learning, Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders, A linearly convergent stochastic recursive gradient method for convex optimization, Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods, Parallel Optimization Techniques for Machine Learning, Backtracking gradient descent method and some applications in large scale optimisation. II: Algorithms and experiments, Neural network regression for Bermudan option pricing, Kernel-based online regression with canal loss, On the regularizing property of stochastic gradient descent, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Accelerated gradient sliding for minimizing a sum of functions, Unnamed Item, Unnamed Item, A review on deep reinforcement learning for fluid mechanics, Nonasymptotic convergence of stochastic proximal point algorithms for constrained convex optimization, Unnamed Item, Unnamed Item, Unnamed Item, Convergence of online mirror descent, Non-asymptotic guarantees for sampling by stochastic gradient descent, Feature uncertainty bounds for explicit feature maps and large robust nonlinear SVM classifiers, Unnamed Item, Accelerated proximal incremental algorithm schemes for non-strongly convex functions, Warped Riemannian Metrics for Location-Scale Models, A robust multi-batch L-BFGS method for machine learning, Sampled Tikhonov regularization for large linear inverse problems, Hyper-parameter optimization for support vector machines using stochastic gradient descent and dual coordinate descent, Stochastic sampling for deterministic structural topology optimization with many load cases: density-based and ground structure approaches, MgNet: a unified framework of multigrid and convolutional neural network, Making the Last Iterate of SGD Information Theoretically Optimal, Deep relaxation: partial differential equations for optimizing deep neural networks, Sequential Quadratic Optimization for Nonlinear Equality Constrained Stochastic Optimization, Fokker--Planck Particle Systems for Bayesian Inference: Computational Approaches, Adaptive Deep Learning for High-Dimensional Hamilton--Jacobi--Bellman Equations, Partial differential equation regularization for supervised machine learning, Incremental proximal gradient scheme with penalization for constrained composite convex optimization problems, Data science vs. statistics: two cultures?, Sampled limited memory methods for massive linear inverse problems, Bilevel optimization, deep learning and fractional Laplacian regularization with applications in tomography, Ghost Penalties in Nonconvex Constrained Optimization: Diminishing Stepsizes and Iteration Complexity, A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation, Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation, Unnamed Item, Convergence of Recursive Stochastic Algorithms Using Wasserstein Divergence, On stochastic Kaczmarz type methods for solving large scale systems of ill-posed equations, High resolution 3D ultrasonic breast imaging by time-domain full waveform inversion, An analysis of stochastic variance reduced gradient for linear inverse problems ^*, Tensor-Structured Sketching for Constrained Least Squares, Optimization with learning-informed differential equation constraints and its applications, Reconstructing the Thermal Phonon Transmission Coefficient at Solid Interfaces in the Phonon Transport Equation, Fast Decentralized Nonconvex Finite-Sum Optimization with Recursive Variance Reduction, Proximal Gradient Methods for Machine Learning and Imaging, A Levenberg-Marquardt method for large nonlinear least-squares problems with dynamic accuracy in functions and gradients, Adaptive two-layer ReLU neural network. I: Best least-squares approximation, Adaptive two-layer ReLU neural network. II: Ritz approximation to elliptic PDEs, On principal components regression, random projections, and column subsampling, A deep learning algorithm for high-dimensional exploratory item factor analysis, Stochastic gradient descent for semilinear elliptic equations with uncertainties, Accelerating mini-batch SARAH by step size rules, ODE-RU: a dynamical system view on recurrent neural networks, On large batch training and sharp minima: a Fokker-Planck perspective, Mathematical optimization in classification and regression trees, A discussion on variational analysis in derivative-free optimization, Stochastic quasi-subgradient method for stochastic quasi-convex feasibility problems, An online conjugate gradient algorithm for large-scale data analysis in machine learning, On obtaining sparse semantic solutions for inverse problems, control, and neural network training, Self-adaptive deep neural network: numerical approximation to functions and PDEs, The mixed deep energy method for resolving concentration features in finite strain hyperelasticity, Adaptive deep density approximation for Fokker-Planck equations, Feasibility-based fixed point networks, Online statistical inference for parameters estimation with linear-equality constraints, A stochastic extra-step quasi-Newton method for nonsmooth nonconvex optimization, Finite-sum smooth optimization with SARAH, Stochastic optimization using a trust-region method and random models, Ritz-like values in steplength selections for stochastic gradient methods, Model order reduction method based on (r)POD-ANNs for parameterized time-dependent partial differential equations, Retracted: Model order reduction method based on machine learning for parameterized time-dependent partial differential equations, Sub-linear convergence of a stochastic proximal iteration method in Hilbert space, Interpreting rate-distortion of variational autoencoder and using model uncertainty for anomaly detection, A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions, Laplacian smoothing gradient descent, Block layer decomposition schemes for training deep neural networks, A distributed conjugate gradient online learning method over networks, Subsampled nonmonotone spectral gradient methods, A regularization interpretation of the proximal point method for weakly convex functions, SHOPPER: a probabilistic model of consumer choice with substitutes and complements, Accelerating incremental gradient optimization with curvature information, Two-point step size gradient method for solving a deep learning problem, Inexact restoration with subsampled trust-region methods for finite-sum minimization, Recursive estimation for sparse Gaussian process regression, A Bayesian perspective of statistical machine learning for big data, Newton-type methods for non-convex optimization under inexact Hessian information, Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization, ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels, On variance reduction for stochastic smooth convex optimization with multiplicative noise, Stochastic gradient descent with Polyak's learning rate, Forward stability of ResNet and its variants, ADMM-softmax: an ADMM approach for multinomial logistic regression, Stochastic variance reduced gradient methods using a trust-region-like scheme, Selection dynamics for deep neural networks, Convergence of stochastic proximal gradient algorithm, PPINN: parareal physics-informed neural network for time-dependent PDEs, Analysis of biased stochastic gradient descent using sequential semidefinite programs, A variation of Broyden class methods using Householder adaptive transforms, Convergence of stochastic gradient descent in deep neural network, Convergence analysis of neural networks for solving a free boundary problem, Stochastic proximal gradient methods for nonconvex problems in Hilbert spaces, Convergence rates for optimised adaptive importance samplers, Optimization problems for machine learning: a survey, A unified convergence analysis of stochastic Bregman proximal gradient and extragradient methods, General convergence analysis of stochastic first-order methods for composite optimization, Parallel decomposition methods for linearly constrained problems subject to simple bound with application to the SVMs training, Non-convergence of stochastic gradient descent in the training of deep neural networks, Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning, Computing Lyapunov functions using deep neural networks, A stochastic subspace approach to gradient-free optimization in high dimensions, Bias of homotopic gradient descent for the hinge loss, Regularization parameter selection for the low rank matrix recovery, Incremental without replacement sampling in nonconvex optimization, Stochastic generalized gradient methods for training nonconvex nonsmooth neural networks, Sequential convergence of AdaGrad algorithm for smooth convex optimization, The generalized equivalence of regularization and min-max robustification in linear mixed models, Adaptive optimization with periodic dither signals, On large-scale unconstrained optimization and arbitrary regularization, On the inexact scaled gradient projection method, Finding best approximation pairs for two intersections of closed convex sets, LSPIA, (stochastic) gradient descent, and parameter correction, Quasi-convex feasibility problems: subgradient methods and convergence rates, Remove the salt and pepper noise based on the high order total variation and the nuclear norm regularization, Quantized convolutional neural networks through the lens of partial differential equations, Convergence results of a nested decentralized gradient method for non-strongly convex problems, On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization, Dimension independent excess risk by stochastic gradient descent, Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization, SRKCD: a stabilized Runge-Kutta method for stochastic optimization, Stopping criteria for, and strong convergence of, stochastic gradient descent on Bottou-Curtis-Nocedal functions, Adaptive machine learning-based surrogate modeling to accelerate PDE-constrained optimization in enhanced oil recovery, SABRINA: a stochastic subspace majorization-minimization algorithm, Adaptive sampling line search for local stochastic optimization with integer variables, Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning, On the convergence of a block-coordinate incremental gradient method, Triangularized orthogonalization-free method for solving extreme eigenvalue problems, Tackling algorithmic bias in neural-network classifiers using Wasserstein-2 regularization, A stochastic gradient algorithm with momentum terms for optimal control problems governed by a convection-diffusion equation with random diffusivity, A subsampling approach for Bayesian model selection, Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games. II: The finite horizon case, A stochastic first-order trust-region method with inexact restoration for finite-sum minimization, A nested primal-dual FISTA-like scheme for composite convex optimization problems, Inertial accelerated SGD algorithms for solving large-scale lower-rank tensor CP decomposition problems, A deep domain decomposition method based on Fourier features, Generating Nesterov's accelerated gradient algorithm by using optimal control theory for optimization, Accelerating variance-reduced stochastic gradient methods, New metrics and tests for subject prevalence in documents based on topic modeling, A simplified convergence theory for Byzantine resilient stochastic gradient descent, Regularized Newton Method with Global \({\boldsymbol{\mathcal{O}(1/{k}^2)}}\) Convergence, Kinetic-based optimization enhanced by genetic dynamics, Robust Accelerated Primal-Dual Methods for Computing Saddle Points, New results in cooperative adaptive optimal output regulation, Stochastic linear regularization methods: random discrepancy principle and applications, A hybrid BB-type method for solving large scale unconstrained optimization, Adaptive sampling stochastic multigradient algorithm for stochastic multiobjective optimization, Levenberg-Marquardt revisited and parameter tuning of river regression models, Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness, On the complexity of a stochastic Levenberg-Marquardt method, Convergence Analysis of a Quasi-Monte CarloBased Deep Learning Algorithm for Solving Partial Differential Equations, A Convergence Study of SGD-Type Methods for Stochastic Optimization, Stochastic projective splitting, On maximum a posteriori estimation with Plug \& Play priors and stochastic gradient descent, Nonlinear Gradient Mappings and Stochastic Optimization: A General Framework with Applications to Heavy-Tail Noise, Boundary-safe PINNs extension: application to non-linear parabolic PDEs in counterparty credit risk, A line search based proximal stochastic gradient algorithm with dynamical variance reduction, Multi-agent natural actor-critic reinforcement learning algorithms, Convergence of Random Reshuffling under the Kurdyka–Łojasiewicz Inequality, Accelerated doubly stochastic gradient descent for tensor CP decomposition, Hessian averaging in stochastic Newton methods achieves superlinear convergence, Accelerating stochastic sequential quadratic programming for equality constrained optimization using predictive variance reduction, Convergence of gradient algorithms for nonconvex \(C^{1+ \alpha}\) cost functions, First-order methods for convex optimization, Subsampling in ensemble Kalman inversion, Deep neural network based adaptive learning for switched systems, On the Convergence of Stochastic Gradient Descent for Linear Inverse Problems in Banach Spaces, Adaptive sampling quasi-Newton methods for zeroth-order stochastic optimization, FedHD: communication-efficient federated learning from hybrid data, An Asymptotic Analysis of Random Partition Based Minibatch Momentum Methods for Linear Regression Models, Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator, Numerical Analysis for Convergence of a Sample-Wise Backpropagation Method for Training Stochastic Neural Networks, Lagrangian and Hamiltonian dynamics for probabilities on the statistical bundle, Neural architecture search via standard machine learning methodologies, Continuous‐time stochastic gradient descent for optimizing over the stationary distribution of stochastic differential equations, Adaptive proximal SGD based on new estimating sequences for sparser ERM, Modified Stochastic Extragradient Methods for Stochastic Variational Inequality, Proximal stochastic recursive momentum algorithm for nonsmooth nonconvex optimization problems, Stochastic gradient descent: where optimization meets machine learning, SGEM: stochastic gradient with energy and momentum, Error assessment of an adaptive finite elements -- neural networks method for an elliptic parametric PDE, Random-reshuffled SARAH does not need full gradient computations, Recent Theoretical Advances in Non-Convex Optimization

Uses Software

Cites Work

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4641709&oldid=18831388"