Cache optimization and performance modeling of batched, small, and rectangular matrix multiplication on Intel, AMD, and Fujitsu processors

Rio Yokota, George Bosilca, Sameer Deshmukh

Publication date: 10 September 2024

Published in: ACM Transactions on Mathematical Software (Search for Journal in Brave)

zbMATH Keywords

performance modeling batched matrix multiplication Cache blocking low-rank matrix multiplication

Mathematics Subject Classification ID

Numerical analysis (65-XX)

Cites Work

This page was built for publication: Cache optimization and performance modeling of batched, small, and rectangular matrix multiplication on Intel, AMD, and Fujitsu processors

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6601380)