| Publication | Date of Publication | Type |
|---|
| Generalizing Random Butterfly Transforms to Arbitrary Matrix Sizes | 2023-12-14 | Paper |
| Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs | 2022-12-09 | Paper |
| Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments | 2022-12-09 | Paper |
| Accelerating Computation of Eigenvectors in the Dense Nonsymmetric Eigenvalue Problem | 2022-12-09 | Paper |
| A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines | 2022-02-01 | Paper |
| Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems | 2021-10-29 | Paper |
| Numerical algorithms for high-performance computational science | 2021-06-15 | Paper |
| Improving the Performance of the GMRES Method using Mixed-Precision Techniques | 2020-11-03 | Paper |
| Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators | 2020-07-20 | Paper |
| The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale | 2018-11-12 | Paper |
| Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU | 2018-07-20 | Paper |
| ParILUT---A New Parallel Threshold ILU Factorization | 2018-07-18 | Paper |
| High-performance matrix-matrix multiplications of very small matrices | 2018-01-11 | Paper |
| Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing | 2017-08-08 | Paper |
| Rectangular full packed format for cholesky's algorithm | 2017-05-19 | Paper |
| Updating incomplete factorization preconditioners for model order reduction | 2016-11-18 | Paper |
| Linear algebra software for large-scale accelerated multicore computing | 2016-07-08 | Paper |
| Accelerating Numerical Dense Linear Algebra Calculations with GPUs | 2015-07-03 | Paper |
| Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs | 2015-06-10 | Paper |
| Communication-Avoiding Symmetric-Indefinite Factorization | 2015-04-21 | Paper |
| Accelerating Linear System Solutions Using Randomization Techniques | 2014-09-12 | Paper |
| Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms | 2014-09-12 | Paper |
| High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures | 2014-09-12 | Paper |
| Designing LU-QR hybrid solvers for performance and stability | 2014-01-21 | Paper |
| Changes in Dense Linear Algebra Kernels: Decades-Long Perspective | 2013-09-26 | Paper |
| Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem | 2013-03-06 | Paper |
| https://portal.mardi4nfdi.de/entity/Q3145773 | 2012-12-23 | Paper |
| High-performance computing systems: Status and outlook | 2012-10-12 | Paper |
| Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems | 2012-08-23 | Paper |
| High-performance high-resolution semi-Lagrangian tracer transport on a sphere | 2011-12-28 | Paper |
| Computing the conditioning of the components of a linear least-squares solution | 2011-06-29 | Paper |
| Accelerating GPU Kernels for Dense Linear Algebra | 2011-03-08 | Paper |
| A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators | 2011-03-08 | Paper |
| Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures | 2011-03-08 | Paper |
| Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing | 2010-11-26 | Paper |
| Accelerating scientific computations with mixed precision algorithms | 2010-10-28 | Paper |
| Towards dense linear algebra for hybrid GPU accelerated manycore systems | 2010-09-02 | Paper |
| REVISITING MATRIX PRODUCT ON MASTER-WORKER PLATFORMS | 2009-02-26 | Paper |
| Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy | 2008-12-21 | Paper |
| State-of-the-art eigensolvers for electronic structure calculations of large scale nano-systems | 2008-07-29 | Paper |
| The Problem with the Linpack Benchmark Matrix Generator | 2008-06-30 | Paper |
| The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot | 2007-05-23 | Paper |
| Large-Scale Scientific Computing | 2006-11-21 | Paper |
| Condition Numbers of Gaussian Random Matrices | 2006-05-31 | Paper |
| Computational Science - ICCS 2004 | 2005-12-23 | Paper |
| Computational Science – ICCS 2005 | 2005-11-30 | Paper |
| Computational Science – ICCS 2005 | 2005-11-30 | Paper |
| Euro-Par 2004 Parallel Processing | 2005-08-23 | Paper |
| https://portal.mardi4nfdi.de/entity/Q3046367 | 2004-08-12 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4436930 | 2003-12-04 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4420596 | 2003-08-18 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4420604 | 2003-08-18 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4411983 | 2003-07-13 | Paper |
| Automatic translation of Fortran to JVM bytecode | 2003-03-25 | Paper |
| Key concepts for parallel out-of-core LU factorization | 2003-03-19 | Paper |
| NetBuild: transparent cross‐platform access to computational software libraries | 2003-02-20 | Paper |
| Innovations of the NetSolve Grid Computing System | 2003-02-20 | Paper |
| The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines | 2003-02-04 | Paper |
| Middleware for the use of storage in communication. | 2003-01-21 | Paper |
| HARNESS fault tolerant MPI design, usage and performance issues | 2003-01-21 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4787342 | 2003-01-06 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4784913 | 2002-12-12 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4552060 | 2002-08-28 | Paper |
| Static tiling for heterogeneous computing platforms. | 2002-07-25 | Paper |
| Clusters and computational grids for scientific computing | 2002-07-14 | Paper |
| Telescoping languages: A strategy for automatic generation of scientific problem-solving systems from annotated libraries | 2002-07-04 | Paper |
| https://portal.mardi4nfdi.de/entity/Q2780668 | 2002-07-02 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4537068 | 2002-06-25 | Paper |
| https://portal.mardi4nfdi.de/entity/Q2779333 | 2002-04-15 | Paper |
| HARNESS and fault tolerant MPI | 2002-03-03 | Paper |
| LAPACK95 Users' Guide | 2002-02-18 | Paper |
| Automated empirical optimizations of software and the ATLAS project | 2001-08-20 | Paper |
| Numerical linear algebra algorithms and software | 2000-12-19 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4942238 | 2000-07-20 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4938107 | 2000-06-25 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4945576 | 2000-03-23 | Paper |
| https://portal.mardi4nfdi.de/entity/Q3147911 | 2000-01-01 | Paper |
| A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures | 1999-11-24 | Paper |
| Using agent-based software for scientific computing in the NetSolve system | 1999-01-12 | Paper |
| Numerical Linear Algebra for High-Performance Computers | 1998-10-05 | Paper |
| Key concepts for parallel out-of-core LU factorization | 1998-07-23 | Paper |
| A set of level 3 basic linear algebra subprograms | 1998-03-23 | Paper |
| Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs | 1998-03-23 | Paper |
| Algorithm 710: FORTRAN subroutines for computing the eigenvalues and eigenvectors of a general matrix by reduction to general tridiagonal form | 1998-02-09 | Paper |
| Software distribution using Xnetlib | 1998-01-26 | Paper |
| Software Libraries for Linear Algebra Computations on High Performance Computers | 1997-11-02 | Paper |
| Chebyshev tau-QZ algorithm methods for calculating spectra of hydrodynamic stability problems | 1997-08-14 | Paper |
| A parallel algorithm for the reduction of a nonsymmetric matrix to block upper-Hessenberg form | 1997-02-28 | Paper |
| Parallel matrix transpose algorithms on distributed memory concurrent computers | 1997-02-28 | Paper |
| Algorithmic bombardment for the iterative solution of linear systems: A poly-iterative approach | 1997-01-07 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4891458 | 1996-09-05 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4860266 | 1996-06-17 | Paper |
| The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form | 1996-06-16 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4855980 | 1996-03-05 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4325973 | 1995-03-13 | Paper |
| The PVM concurrent computing system: Evolution, experiences, and trends | 1995-01-29 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4288941 | 1994-12-04 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4288943 | 1994-10-09 | Paper |
| https://portal.mardi4nfdi.de/entity/Q3139427 | 1993-11-15 | Paper |
| A Parallel Algorithm for the Nonsymmetric Eigenvalue Problem | 1993-11-11 | Paper |
| Reduction to condensed form for the eigenvalue problem on distributed memory architectures | 1993-01-17 | Paper |
| https://portal.mardi4nfdi.de/entity/Q3997799 | 1992-09-17 | Paper |
| Numerical Considerations in Computing Invariant Subspaces | 1992-06-28 | Paper |
| https://portal.mardi4nfdi.de/entity/Q5750327 | 1990-01-01 | Paper |
| Block reduction of matrices to condensed forms for eigenvalue computations | 1989-01-01 | Paper |
| Programming methodology and performance issues for advanced computer architectures | 1988-01-01 | Paper |
| Tools to aid in the analysis of memory access patterns for FORTRAN programs | 1988-01-01 | Paper |
| https://portal.mardi4nfdi.de/entity/Q3482749 | 1988-01-01 | Paper |
| An extended set of FORTRAN basic linear algebra subprograms | 1988-01-01 | Paper |
| Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs | 1988-01-01 | Paper |
| https://portal.mardi4nfdi.de/entity/Q3793601 | 1988-01-01 | Paper |
| Corrigenda: “An Extended Set of FORTRAN Basic Linear Algebra Subprograms” | 1988-01-01 | Paper |
| Solving banded systems on a parallel processor | 1987-01-01 | Paper |
| A Fully Parallel Algorithm for the Symmetric Eigenvalue Problem | 1987-01-01 | Paper |
| https://portal.mardi4nfdi.de/entity/Q3819886 | 1987-01-01 | Paper |
| Squeezing the most out of eigenvalue solvers on high-performance computers | 1986-01-01 | Paper |
| Implementation of some concurrent algorithms for matrix factorization | 1986-01-01 | Paper |
| Linear algebra on high performance computers | 1986-01-01 | Paper |
| Implementing Dense Linear Algebra Algorithms Using Multitasking on the CRAY X-MP-4 (or Approaching the Gigaflop) | 1986-01-01 | Paper |
| On some parallel banded system solvers | 1985-01-01 | Paper |
| A collection of parallel linear equations routines for the Denelcor HEP | 1984-01-01 | Paper |
| https://portal.mardi4nfdi.de/entity/Q3217518 | 1984-01-01 | Paper |
| Improving the Accuracy of Computed Singular Values | 1983-01-01 | Paper |
| Improving the Accuracy of Computed Eigenvalues and Eigenvectors | 1983-01-01 | Paper |
| Algorithm 589: SICEDR : A FORTRAN Subroutine for Improving the Accuracy of Computed Matrix Eigenvalues | 1982-01-01 | Paper |
| https://portal.mardi4nfdi.de/entity/Q3932291 | 1980-01-01 | Paper |
| Unrolling loops in fortran | 1979-01-01 | Paper |
| Matrix eigensystem routines. EISPACK guide extension | 1977-01-01 | Paper |