Pages that link to "Item:Q3549230"
From MaRDI portal
The following pages link to Anatomy of high-performance matrix multiplication (Q3549230):
Displaying 41 items.
- Scientific computations on multi-core systems using different programming frameworks (Q268844) (← links)
- Towards an efficient use of the BLAS library for multilinear tensor contractions (Q272476) (← links)
- Penalized splines for smooth representation of high-dimensional Monte Carlo datasets (Q313059) (← links)
- Upper and lower I/O bounds for pebbling \(r\)-pyramids (Q450538) (← links)
- Deriving dense linear algebra libraries (Q469358) (← links)
- PARFES: A method for solving finite element linear equations on multi-core computers (Q614000) (← links)
- The evaluation of American options in a stochastic volatility model with jumps: an efficient finite element approach (Q614340) (← links)
- Fast verified solutions of linear systems (Q849174) (← links)
- Blocked algorithms for the reduction to Hessenberg-triangular form revisited (Q960033) (← links)
- Heterogeneous computing on mixed unstructured grids with pyfr (Q1645961) (← links)
- Parallel direct solver for solving systems of linear equations resulting from finite element method on multi-core desktops and workstations (Q2006579) (← links)
- Direct reconstruction method for discontinuous Galerkin methods on higher-order mixed-curved meshes III. Code optimization via tensor contraction (Q2028168) (← links)
- The matrix reloaded: multiplication strategies in FrodoKEM (Q2149815) (← links)
- GMRES with embedded ensemble propagation for the efficient solution of parametric linear systems in uncertainty quantification of computational models (Q2236142) (← links)
- Modulated rotating waves in the magnetised spherical Couette system (Q2282802) (← links)
- Safe feature elimination for non-negativity constrained convex optimization (Q2302836) (← links)
- A comparison of high-order time integrators for thermal convection in rotating spherical shells (Q2638300) (← links)
- High dimensional tori and chaotic and intermittent transients in magnetohydrodynamic Couette flows (Q2684112) (← links)
- BLIS: a framework for rapidly instantiating BLAS functionality (Q2828133) (← links)
- Parallel Matrix Multiplication: A Systematic Journey (Q2954477) (← links)
- Oscillatory Convection in Rotating Spherical Shells: Low Prandtl Number and Non-Slip Boundary Conditions (Q3449036) (← links)
- High-Performance Tensor Contraction without Transposition (Q4600011) (← links)
- (Q4636971) (← links)
- Automatic generation of fast algorithms for matrix–vector multiplication (Q4641573) (← links)
- Implementing High-Performance Complex Matrix Multiplication via the 1M Method (Q5131971) (← links)
- Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance (Q5177205) (← links)
- A Componentwise Splitting Method for Pricing American Options Under the Bates Model (Q5189607) (← links)
- Continuation and stability of rotating waves in the magnetized spherical Couette system: secondary transitions and multistability (Q5243638) (← links)
- Computing the Gradient in Optimization Algorithms for the CP Decomposition in Constant Memory through Tensor Blocking (Q5258607) (← links)
- Analytical Modeling Is Enough for High-Performance BLIS (Q5270773) (← links)
- Householder QR Factorization With Randomization for Column Pivoting (HQRRP) (Q5738147) (← links)
- Dominant speed factors of active set methods for fast MPC (Q5745039) (← links)
- Strassen's Algorithm for Tensor Contraction (Q5745128) (← links)
- Multidimensional Array Data Management (Q5886004) (← links)
- An efficient implementation of two-component relativistic density functional theory with torque-free auxiliary variables (Q6108598) (← links)
- Architecture-based and target-oriented algorithm optimization of high-order methods via complete-search tensor contraction (Q6155459) (← links)
- A high-performance implementation of atomistic spin dynamics simulations on x86 CPUs (Q6167684) (← links)
- Numerical stability of algorithms at extreme scale and low precisions (Q6200205) (← links)
- Parameter estimation via time modeling for MLIR implementation of GEMM (Q6588742) (← links)
- Cache optimization and performance modeling of batched, small, and rectangular matrix multiplication on Intel, AMD, and Fujitsu processors (Q6601380) (← links)
- Algorithm 1039: automatic generators for a family of matrix multiplication routines with Apache TVM (Q6604157) (← links)