Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM
From MaRDI portal
Publication:6345838
arXiv2007.13055MaRDI QIDQ6345838
Zijing Gu
Publication date: 26 July 2020
Abstract: We implemented and optimized matrix multiplications between dense and block-sparse matrices on CUDA. We leveraged TVM, a deep learning compiler, to explore the schedule space of the operation and generate efficient CUDA code. With the automatic parameter tuning in TVM, our cross-thread reduction based implementation achieved competitive or better performance compared with other state-of-the-art frameworks.
Has companion code repository: https://github.com/ceruleangu/Block-Sparse-Benchmark
This page was built for publication: Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6345838)