Article: An asynchronous and parallel row-wise compressed SpMV kernel on heterogeneous CPU-GPU architectures Journal: International Journal of Embedded Systems (IJES) 2022 Vol.15 No.5 pp.377 - 384 Abstract: Sparse matrix vector multiplication (SpMV) is a fundamental and important algebra used extensively in various fields, such as machine learning, data mining and numerical simulation. Accelerating SpMV benefits the corresponding real-world applications. Meanwhile, heterogeneous CPU-GPU architectures are inevitable to realise high-performance computing. Therefore, this paper designs an asynchronous parallel row-wise compressed SpMV kernel by utilising the heterogeneous CPU-GPU architectures to accelerate applications that use SpMV. First, according to the memory access and control flow of SpMV and the architectural features of the heterogeneous CPU-GPU systems, the heterogeneous parallelisation of the row-wise compressed SpMV based on CSR format is designed. Next, an asynchronous method is designed for the parallel compressed SpMV kernel to create an additional level of parallelism in SpMV execution. The proposed SpMV kernel can obtain up to 8.97% improvement from the asynchronous method on the heterogeneous CPU-GPU architecture and perform the nearly linear speedup across GPU threads. Inderscience Publishers - linking academia, business and industry through research

Title: An asynchronous and parallel row-wise compressed SpMV kernel on heterogeneous CPU-GPU architectures

Authors: Huachen Tan

Addresses: Department of Internet, Hunan Agricultural University, Changsha, Hunan, China

Abstract: Sparse matrix vector multiplication (SpMV) is a fundamental and important algebra used extensively in various fields, such as machine learning, data mining and numerical simulation. Accelerating SpMV benefits the corresponding real-world applications. Meanwhile, heterogeneous CPU-GPU architectures are inevitable to realise high-performance computing. Therefore, this paper designs an asynchronous parallel row-wise compressed SpMV kernel by utilising the heterogeneous CPU-GPU architectures to accelerate applications that use SpMV. First, according to the memory access and control flow of SpMV and the architectural features of the heterogeneous CPU-GPU systems, the heterogeneous parallelisation of the row-wise compressed SpMV based on CSR format is designed. Next, an asynchronous method is designed for the parallel compressed SpMV kernel to create an additional level of parallelism in SpMV execution. The proposed SpMV kernel can obtain up to 8.97% improvement from the asynchronous method on the heterogeneous CPU-GPU architecture and perform the nearly linear speedup across GPU threads.

Keywords: heterogeneous CPU-GPU architectures; parallel; sparse matrix vector multiplication; SpMV.

DOI: 10.1504/IJES.2022.127162

International Journal of Embedded Systems, 2022 Vol.15 No.5, pp.377 - 384

Received: 28 Jan 2022
Received in revised form: 18 Apr 2022
Accepted: 18 May 2022
Published online: 23 Nov 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: An asynchronous and parallel row-wise compressed SpMV kernel on heterogeneous CPU-GPU architectures

Keep up-to-date