A selective review on statistical methods for massive data computation: distributed computing, subsampling, and minibatch techniques

DOI10.1080/24754269.2024.2343151MaRDI QIDQ6620576

Haobo Qi, Jing Zhou, Feifei Wang, Yuan Gao, Danyang Huang, Hong Chang, Yingqiu Zhu, Yingying Ma, Ke Xu, Xuetong Li, Shuyuan Wu, Rui Pan, Hansheng Wang, Xuening Zhu

Publication date: 17 October 2024

Published in: Statistical Theory and Related Fields (Search for Journal in Brave)

Mathematics Subject Classification ID

Statistics (62-XX)

Cites Work

Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
Divide and conquer local average regression
Faster least squares approximation
Bayesian variable selection in quantile regression
One-step sparse estimates in nonconcave penalized likelihood models
Some asymptotic theory for the bootstrap
The jackknife estimate of variance
Bootstrap methods: another look at the jackknife
Distributed testing and estimation under sparse high dimensional models
A distributed one-step estimator
Distributed kernel-based gradient descent algorithms
The jackknife and bootstrap
Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods
Distributed statistical inference for massive data
Block average quantile regression for massive dataset
Statistical inference for model parameters in stochastic gradient descent
Bridging the gap between constant step size stochastic gradient descent and Markov chains
Why random reshuffling beats stochastic gradient descent
Distributed one-step upgraded estimation for non-uniformly and non-randomly distributed data
Distributed estimation of principal eigenspaces
Quantile regression under memory constraint
Distributed simultaneous inference in generalized linear models via confidence distribution
First-order and stochastic optimization methods for machine learning
Divide-and-conquer information-based optimal subdata selection algorithm
Asymptotic and finite-sample properties of estimators based on stochastic gradients
Distributed inference for quantile regression processes
Optimal subsampling for softmax regression
On the Convergence of Decentralized Gradient Descent
Forward Regression for Ultra-High Dimensional Variable Screening
Randomized Algorithms for Matrices and Data
A split-and-conquer approach for analysis of
Random sampling with a reservoir
Acceleration of Stochastic Approximation by Averaging
Variable Metric Method for Minimization
Regression Quantiles
Reservoir-sampling algorithms of time complexity O ( n (1 + log( N / n )))
Mathematical Statistics
Asymptotic Statistics
First-Order Methods in Optimization
Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs
On the optimality of averaging in distributed statistical learning
Batched Stochastic Gradient Descent with Weighted Sampling
Decentralized Quasi-Newton Methods
Sure Independence Screening for Ultrahigh Dimensional Feature Space
Optimization Methods for Large-Scale Machine Learning
Feature Screening via Distance Correlation Learning
On the Local and Superlinear Convergence of Quasi-Newton Methods
Optimal Subsampling for Large Sample Logistic Regression
A Distributed and Integrated Method of Moments for High-Dimensional Correlated Data Analysis
Bayesian quantile regression for ordinal longitudinal data
Least-Square Approximation for a Distributed System
A Scalable Bootstrap for Massive Data
Renewable Estimation and Incremental Inference in Generalized Linear Models with Streaming Data Sets
Statistical Foundations of Data Science
Accelerated Distributed Nesterov Gradient Descent
Information-Based Optimal Subdata Selection for Big Data Linear Regression
Communication-Efficient Distributed Statistical Inference
Bayesian Spatial Quantile Regression
Some methods of speeding up the convergence of iteration methods
A Family of Variable-Metric Methods Derived by Variational Means
Stochastic Estimation of the Maximum of a Regression Function
A Stochastic Approximation Method
Optimal subsampling for quantile regression in big data
Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators With Massive Data
Automatic, dynamic, and nearly optimal learning rate specification via local quadratic approximation
Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources
Distributed Empirical Likelihood Approach to Integrating Unbalanced Datasets
Online Covariance Matrix Estimation in Stochastic Gradient Descent
First-Order Newton-Type Estimator for Distributed Estimation and Inference
Communication-Efficient Accurate Statistical Estimation
An Asymptotic Analysis of Random Partition Based Minibatch Momentum Methods for Linear Regression Models
Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator
Network Gradient Descent Algorithm for Decentralized Federated Learning
Optimal Subsampling Bootstrap for Massive Data
Fast and robust sparsity learning over networks: a decentralized surrogate median regression approach
Quasi-Newton updating for large-scale distributed learning
Estimation and Inference for Multi-Kink Quantile Regression
A Note on Distributed Quantile Regression by Pilot Sampling and One-Step Updating
Feature Screening for Massive Data Analysis by Subsampling
Sequential one-step estimator by sub-sampling for customer churn analysis with massive data sets

This page was built for publication: A selective review on statistical methods for massive data computation: distributed computing, subsampling, and minibatch techniques