Communication lower bounds and optimal algorithms for numerical linear algebra
From MaRDI portal
Publication:4683913
DOI10.1017/S0962492914000038zbMath1396.65082MaRDI QIDQ4683913
Mark Hoemmen, Nicholas Knight, Grey Ballard, Erin Claire Carson, Oded Schwartz, James W. Demmel
Publication date: 26 September 2018
Published in: Acta Numerica (Search for Journal in Brave)
Research exposition (monographs, survey articles) pertaining to numerical analysis (65-02) Complexity and performance of numerical algorithms (65Y20) Numerical linear algebra (65Fxx)
Related Items
Performance of the Low-Rank TT-SVD for Large Dense Tensors on Modern MultiCore CPUs, Limited‐memory polynomial methods for large‐scale matrix functions, Parallel Matrix Multiplication: A Systematic Journey, Analytical modeling of matrix–vector multiplication on multicore processors, Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation, Adaptively restarted block Krylov subspace methods with low-synchronization skeletons, GMRES algorithms over 35 years, Fast matrix multiplication and its algebraic neighbourhood, Avoiding Communication in Primal and Dual Block Coordinate Descent Methods, Predict-and-Recompute Conjugate Gradient Variants, Energy aware performance study for a class of computationally intensive Monte Carlo algorithms, A Robust and Efficient Implementation of LOBPCG, How Bad Are Vandermonde Matrices?, Block Gram-Schmidt algorithms and their stability properties, Accuracy of the $s$-Step Lanczos Method for the Symmetric Eigenproblem in Finite Precision, Random multipliers numerically stabilize Gaussian and block Gaussian elimination: proofs and an extension to low-rank approximation, Parallel Algorithms for Tensor Train Arithmetic
Uses Software
Cites Work
- Communication-optimal Parallel and Sequential Cholesky Decomposition
- Graph expansion and communication costs of fast matrix multiplication
- Multiplying matrices faster than coppersmith-winograd
- Parallel out-of-core computation and updating of the QR factorization
- An updated set of basic linear algebra subprograms (BLAS)
- The Lanczos and Conjugate Gradient Algorithms
- Solving linear least squares problems by Gram-Schmidt orthogonalization
- Round off error analysis for Gram-Schmidt method and solution of linear least squares problems
- On the reduction of a symmetric matrix to tridiagonal form
- Nested Dissection of a Regular Finite Element Mesh
- Complexity Bounds for Regular Finite Difference and Finite Element Grids
- A Theorem on Boolean Matrices
- An inequality related to the isoperimetric inequality
- The principle of minimized iterations in the solution of the matrix eigenvalue problem
- Methods of conjugate gradients for solving linear systems
- Bulk-synchronous parallel Gaussian elimination
- Extending the Hong-Kung model to memory hierarchies
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Tensor Decompositions and Applications
- On the generation of Krylov subspace bases
- Communication efficient matrix multiplication on hypercubes
- Optimal sparse matrix dense vector multiplication in the I/O-model
- Parallel iterative S-step methods for unsymmetric linear systems
- A performance model for Krylov subspace methods on mesh-based parallel computers
- Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures
- The cache complexity of multithreaded cache oblivious algorithms
- The loss of orthogonality in the Gram-Schmidt orthogonalization process
- A note on the error analysis of classical Gram-Schmidt
- Fast matrix multiplication is stable
- Communication complexity of PRAMs
- Newton interpolation at Leja points
- The analysis of a nested dissection algorithm
- s-step iterative methods for symmetric linear systems
- Memory-efficient matrix multiplication in the BSP model
- Size bounds for superconcentrators
- An efficient nonsymmetric Lanczos method on parallel vector computers
- Matrix eigensystem routines - EISPACK guide. 2nd ed
- Efficient out-of-core algorithms for linear relaxation using blocking covers
- An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems
- Numerical behaviour of the modified Gram-Schmidt GMRES implementation
- Size-estimation framework with applications to transitive closure and reachability
- Cache optimization for structured and unstructured grid multigrid
- Processor-time tradeoffs under bounded-speed message propagation. II: Lower bounds
- On the efficient implementation of preconditioned s-step conjugate gradient methods on multiprocessors with memory hierarchy
- Communication lower bounds for distributed-memory matrix multiplication
- Reliable updated residuals in hybrid Bi-CG methods
- Finite bounds for Hölder-Brascamp-Lieb multilinear inequalities
- Fast linear algebra is stable
- Gaussian elimination is not optimal
- When cache blocking of sparse matrix vector multiply works and why
- Accuracy of Two Three-term and Three Two-term Recurrences for Krylov Space Solvers
- The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance
- The Multishift QR Algorithm. Part II: Aggressive Early Deflation
- Algorithm 953
- Tight Bounds for Low Dimensional Star Stencils in the External Memory Model
- Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods
- A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of $s$-Step Krylov Subspace Methods
- Communication-optimal Parallel and Sequential QR and LU Factorizations
- Partitioned Triangular Tridiagonalization
- The university of Florida sparse matrix collection
- Fast algorithms for hierarchically semiseparable matrices
- Minimizing Communication in Numerical Linear Algebra
- CALU: A Communication Optimal LU Factorization Algorithm
- On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms
- Analysis of Pairwise Pivoting in Gaussian Elimination
- Stability Analysis and Improvement of the Block Gram–Schmidt Algorithm
- The Lanczos and conjugate gradient algorithms in finite precision arithmetic
- Communication Avoiding Rank Revealing QR Factorization with Column Pivoting
- Implementation of the GMRES Method Using Householder Transformations
- Average-Case Stability of Gaussian Elimination
- Cache efficient bidiagonalization using BLAS 2.5 operators
- The I/O Complexity of Sparse Matrix Dense Matrix Multiplication
- Evaluating Non-square Sparse Bilinear Forms on Multiple Vector Pairs in the I/O-Model
- Performance and Accuracy of LAPACK's Symmetric Tridiagonal Eigensolvers
- A Look-Ahead Lanczos Algorithm for Unsymmetric Matrices
- GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems
- Practical Use of Polynomial Preconditionings for the Conjugate Gradient Method
- The WY Representation for Products of Householder Matrices
- An extended set of FORTRAN basic linear algebra subprograms
- Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs
- A Storage-Efficient $WY$ Representation for Products of Householder Transformations
- Parallel Matrix and Graph Algorithms
- Implicit Application of Polynomial Filters in a k-Step Arnoldi Method
- Modification of the Householder Method Based on the Compact WY Representation
- Parallelizable restarted iterative methods for nonsymmetric linear systems. part I: Theory
- Some Stable Methods for Calculating Inertia and Solving Symmetric Linear Systems
- Basic Linear Algebra Subprograms for Fortran Usage
- A Newton basis GMRES implementation
- Estimating the Attainable Accuracy of Recursively Computed Residual Methods
- ScaLAPACK Users' Guide
- A set of level 3 basic linear algebra subprograms
- Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs
- Locality of Reference in LU Decomposition with Partial Pivoting
- A framework for symmetric band reduction
- Algorithm 807
- A Robust Criterion for the Modified Gram--Schmidt Algorithm with Selective Reorthogonalization
- Analysis of the finite precision bi-conjugate gradient algorithm for nonsymmetric linear systems
- Look-Ahead Procedures for Lanczos-Type Product Methods Based on Three-Term Lanczos Recurrences
- A Priori Sparsity Patterns for Parallel Sparse Approximate Inverse Preconditioners
- A Deflated Version of the Conjugate Gradient Algorithm
- Residual Replacement Strategies for Krylov Subspace Iterative Methods for the Convergence of True Residuals
- Templates for the Solution of Algebraic Eigenvalue Problems
- A Block Orthogonalization Procedure with Constant Synchronization Requirements
- Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization
- Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication
- Block Gram–Schmidt Orthogonalization