Communication-optimal Parallel and Sequential QR and LU Factorizations

DOI10.1137/080731992zbMath1241.65028arXiv0808.2664OpenAlexW2157237396WikidataQ114074373 ScholiaQ114074373MaRDI QIDQ2882786

Laura Grigori, Mark Hoemmen, Julien Langou, James W. Demmel

Publication date: 7 May 2012

Published in: SIAM Journal on Scientific Computing (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/0808.2664

zbMATH Keywords

comparison of methods numerical examples parallel computation parallel algorithm LU factorization QR factorization rectangular matrices ``non-Strassen-like QRHouseholder QR ScaLAPACK algorithms

Mathematics Subject Classification ID

Factorization of matrices (15A23) Numerical solutions to overdetermined systems, pseudoinverses (65F20) Parallel numerical computation (65Y05) Complexity and performance of numerical algorithms (65Y20) Direct numerical methods for linear systems and matrix inversion (65F05) Orthogonalization in numerical linear algebra (65F25)

Related Items

Randomized numerical linear algebra: Foundations and algorithms, A parallel algorithm for calculation of determinants and minors using arbitrary precision arithmetic, Randomized algorithms for distributed computation of principal component analysis and singular value decomposition, Data Driven Modal Decompositions: Analysis and Enhancements, Increasing the Performance of the Jacobi--Davidson Method by Blocking, The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale, Gram-Schmidt orthogonalization: 100 years and more, Scaling Up Parallel Computation of Tiled QR Factorizations by a Distributed Scheduling Runtime System and Analytical Modeling, Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing, Performance of the Low-Rank TT-SVD for Large Dense Tensors on Modern MultiCore CPUs, Parallel \(\mathcal {H}\)-matrix arithmetic on distributed-memory systems, High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations, Randomized QR with Column Pivoting, Simultaneous multidiagonalization for the CS decomposition, TTDFT: a GPU accelerated Tucker tensor DFT code for large-scale Kohn-Sham DFT calculations, Mixed Precision Iterative Refinement with Sparse Approximate Inverse Preconditioning, Householder Orthogonalization with a Nonstandard Inner Product, A novel parallel algorithm based on the Gram-Schmidt method for tridiagonal linear systems of equations, A distributed block Chebyshev-Davidson algorithm for parallel spectral clustering, On sampling determinantal and Pfaffian point processes on a quantum computer, Adaptively restarted block Krylov subspace methods with low-synchronization skeletons, A fast randomized algorithm for computing an approximate null space, GPU parameter tuning for tall and skinny dense linear least squares problems, GMRES algorithms over 35 years, Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting, FaIMS: a fast algorithm for the inverse medium problem with multiple frequencies and multiple sources for the scalar Helmholtz equation, Avoiding Communication in Primal and Dual Block Coordinate Descent Methods, A parallel and streaming dynamic mode decomposition algorithm with finite precision error analysis for large data, Randomized Projection for Rank-Revealing Matrix Factorizations and Low-Rank Approximations, Cholesky and Gram-Schmidt Orthogonalization for Tall-and-Skinny QR Factorizations on Graphics Processors, Parallel QR Factorization of Block-Tridiagonal Matrices, Introduction to Communication Avoiding Algorithms for Direct Methods of Factorization in Linear Algebra, GMRES with embedded ensemble propagation for the efficient solution of parametric linear systems in uncertainty quantification of computational models, Varying the \(s\) in your \(s\)-step GMRES, Adapting Regularized Low-Rank Models for Parallel Architectures, Iteratively Reweighted FGMRES and FLSQR for Sparse Reconstruction, Backward error analysis of the AllReduce algorithm for Householder QR decomposition, Linear algebra software for large-scale accelerated multicore computing, Communication lower bounds and optimal algorithms for numerical linear algebra, Towards dense linear algebra for hybrid GPU accelerated manycore systems, Fast multipole preconditioners for sparse matrices arising from elliptic equations, Linear-time CUR approximation of BEM matrices, Random projections for Bayesian regression, Enlarged Krylov Subspace Conjugate Gradient Methods for Reducing Communication, Shifted Cholesky QR for Computing the QR Factorization of Ill-Conditioned Matrices, A Randomized Blocked Algorithm for Efficiently Computing Rank-revealing Factorizations of Matrices, Communication-Efficient Distributed Statistical Inference, Considerations on the Implementation and Use of Anderson Acceleration on Distributed Memory and GPU-based Parallel Computers, A Distributed and Incremental SVD Algorithm for Agglomerative Data Analysis on Large Networks, Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects, Block Gram-Schmidt algorithms and their stability properties, Scalable Linear Solvers Based on Enlarged Krylov Subspaces with Dynamic Reduction of Search Directions, Numerical algorithms for high-performance computational science, Robust and Accurate Stopping Criteria for Adaptive Randomized Sampling in Matrix-Free Hierarchically Semiseparable Construction, Block Modified Gram--Schmidt Algorithms and Their Analysis, Rounding Error Analysis of Mixed Precision Block Householder QR Algorithms, Communication Avoiding ILU0 Preconditioner, Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs, The red-blue pebble game on trees and DAGs with large input, Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, Parallel Algorithms for Tensor Train Arithmetic

Uses Software