Title: Parallel sparse matrix-matrix multiplication: a scalable solution with 1D algorithm

Authors: Mohammad Asadul Hoque; Md Rezaul Karim Raju; Christopher John Tymczak; Daniel Vrinceanu; Kiran Chilakamarri

Addresses: Department of Computing, East Tennessee State University, Box 70711, Johnson City, TN 37614-1710, USA ' Department of Computer Science, Texas Southern University, 3100 Cleburne St., Houston, TX 77004, USA ' Department of Physics, Texas Southern University, 3100 Cleburne St., Houston, TX 77004, USA ' Department of Physics, Texas Southern University, 3100 Cleburne St., Houston, TX 77004, USA ' Department of Mathematics, Texas Southern University, 3100 Cleburne St., Houston, TX 77004, USA

Abstract: This paper presents a novel implementation of parallel sparse matrix-matrix multiplication using distributed memory systems on heterogeneous hardware architecture. The proposed algorithm is expected to be linearly scalable up to several thousands of processors for matrices with dimensions over 106 (million). Our approach of parallelism is based on 1D decomposition and can work for both structured and unstructured sparse matrices. The storage mechanism is based on distributed hash lists, which reduces the latency for accessing and modifying an element of the product matrix, while reducing the overall merging time of the partial results computed by the processors. Theoretically, the time and space complexity of our algorithm is linearly proportional to the total number of non-zero elements in the product matrix C. The results of the performance evaluation show that the algorithm scales much better for sparse matrices with bigger dimensions. The speedup achieved using our algorithm is much better than other existing 1D algorithms. We have been able to achieve about 500 times speedup with only 672 processors. We also identified the impact of hardware architecture on scalability.

Keywords: MPI; message passing interface; scalability; sparse matrix; parallel algorithm; distributed computing; sparse matrix-matrix multiplication; parallelism; distributed hash lists.

DOI: 10.1504/IJCSE.2015.073498

International Journal of Computational Science and Engineering, 2015 Vol.11 No.4, pp.391 - 401

Received: 20 Sep 2013
Accepted: 28 Oct 2013

Published online: 10 Dec 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article