Towards dense linear algebra for hybrid GPU accelerated manycore systems
From MaRDI portal
Publication:991102
DOI10.1016/j.parco.2009.12.005zbMath1204.68268OpenAlexW2162322364MaRDI QIDQ991102
Marc Baboulin, Jack J. Dongarra, Stanimire Z. Tomov
Publication date: 2 September 2010
Published in: Parallel Computing (Search for Journal in Brave)
Full work available at URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.5312
parallel algorithmsgraphics processing unitsmulticore processorsdense linear algebrahybrid computing
Parallel algorithms in computer science (68W10) Parallel numerical computation (65Y05) Computer system organization (68M99) Numerical linear algebra (65F99)
Related Items
Direct numerical simulations of reacting flows with detailed chemistry using many-core/GPU acceleration, Computing Least Squares Condition Numbers on Hybrid Multicore/GPU Systems, Extending the length and time scales of Gram-Schmidt Lyapunov vector computations, Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing, A parallel computing method using blocked format with optimal partitioning for SpMV on GPU, ELSI -- an open infrastructure for electronic structure solvers, GPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions, GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and Hermitian eigenproblems, GPU parameter tuning for tall and skinny dense linear least squares problems, Quantum circuits synthesis using Householder transformations, A new approach to the lattice Boltzmann method for graphics processing units, GPU accelerated computation of the isogeometric analysis stiffness matrix, A new era in scientific computing: domain decomposition methods in hybrid CPU-GPU architectures, GPU-acceleration of stiffness matrix calculation and efficient initialization of EFG meshless methods, Adapting Regularized Low-Rank Models for Parallel Architectures, Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems, Unnamed Item, Direct numerical simulations of turbulent reacting flows with shock waves and stiff chemistry using many-core/GPU acceleration, Productivity, performance, and portability for computational fluid dynamics applications, A heterogeneous parallel LU factorization algorithm based on a basic column block uniform allocation strategy, Randomized GPU Algorithms for the Construction of Hierarchical Matrices from Matrix-Vector Operations, A linear algebra method to decompose forms whose length is lower than the number of variables into weighted sum of squares, Simulating Low Precision Floating-Point Arithmetic, Exploiting Lower Precision Arithmetic in Solving Symmetric Positive Definite Linear Systems and Least Squares Problems
Uses Software
Cites Work
- Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing
- Communication-optimal Parallel and Sequential QR and LU Factorizations
- Minimizing Communication in Numerical Linear Algebra
- Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy
- Out-of-core solution of linear systems on graphics processors
- LAPACK Users' Guide
- GEMM-based level 3 BLAS
- Accuracy and Stability of Numerical Algorithms