Towards dense linear algebra for hybrid GPU accelerated manycore systems

From MaRDI portal
Publication:991102

DOI10.1016/j.parco.2009.12.005zbMath1204.68268OpenAlexW2162322364MaRDI QIDQ991102

Marc Baboulin, Jack J. Dongarra, Stanimire Z. Tomov

Publication date: 2 September 2010

Published in: Parallel Computing (Search for Journal in Brave)

Full work available at URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.5312



Related Items

Direct numerical simulations of reacting flows with detailed chemistry using many-core/GPU acceleration, Computing Least Squares Condition Numbers on Hybrid Multicore/GPU Systems, Extending the length and time scales of Gram-Schmidt Lyapunov vector computations, Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing, A parallel computing method using blocked format with optimal partitioning for SpMV on GPU, ELSI -- an open infrastructure for electronic structure solvers, GPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions, GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and Hermitian eigenproblems, GPU parameter tuning for tall and skinny dense linear least squares problems, Quantum circuits synthesis using Householder transformations, A new approach to the lattice Boltzmann method for graphics processing units, GPU accelerated computation of the isogeometric analysis stiffness matrix, A new era in scientific computing: domain decomposition methods in hybrid CPU-GPU architectures, GPU-acceleration of stiffness matrix calculation and efficient initialization of EFG meshless methods, Adapting Regularized Low-Rank Models for Parallel Architectures, Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems, Unnamed Item, Direct numerical simulations of turbulent reacting flows with shock waves and stiff chemistry using many-core/GPU acceleration, Productivity, performance, and portability for computational fluid dynamics applications, A heterogeneous parallel LU factorization algorithm based on a basic column block uniform allocation strategy, Randomized GPU Algorithms for the Construction of Hierarchical Matrices from Matrix-Vector Operations, A linear algebra method to decompose forms whose length is lower than the number of variables into weighted sum of squares, Simulating Low Precision Floating-Point Arithmetic, Exploiting Lower Precision Arithmetic in Solving Symmetric Positive Definite Linear Systems and Least Squares Problems


Uses Software


Cites Work