Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM

arXiv2007.13055MaRDI QIDQ6345838

Zijing Gu

Publication date: 26 July 2020

Abstract: We implemented and optimized matrix multiplications between dense and block-sparse matrices on CUDA. We leveraged TVM, a deep learning compiler, to explore the schedule space of the operation and generate efficient CUDA code. With the automatic parameter tuning in TVM, our cross-thread reduction based implementation achieved competitive or better performance compared with other state-of-the-art frameworks.

Has companion code repository: https://github.com/ceruleangu/Block-Sparse-Benchmark

This page was built for publication: Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6345838)