Title: NIC-based reduction algorithms for large-scale clusters

Authors: Fabrizio Petrini, Adam Moody, Juan Fernandez, Eitan Frachtenberg, Dhabaleswar K. Panda

Addresses: Applied Computer Science Group, Pacific Northwest National Laboratory, Richland, WA 99352, USA. ' Integrated Computing and Communications Department, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA. ' Computer Engineering Department, University of Murcia, 30071 Murcia, Spain. ' Computer and Computational Sciences (CCS) Division, Los Alamos National Laboratory, NM 87545, USA. ' Department of Computer and Information Science, The Ohio State University, Columbus, OH 43210, USA

Abstract: Efficient reduction algorithms are crucial to many large-scale, parallel scientific applications. While previous algorithms constrain processing to the host CPU, we explore and utilise the processors in modern cluster Network Interface Cards (NICs). We present the design issues, solutions, analytical models, and experimental evaluations of a family of NIC-based reduction algorithms. Through experiments on the ALC cluster at Lawrence Livermore National Laboratory, which connects 960 dual-CPU nodes with the Quadrics QsNet interconnect, we find NIC-based reductions to be more efficient than host-based implementations. At large-scale, our NIC-based reductions are more than twice as fast as the host-based, production-level MPI implementation.

Keywords: cluster computing; reduce; allreduce; Quadrics QsNet; NIC-based operations; network interface cards; collective communication; reduction algorithms.

DOI: 10.1504/IJHPCN.2006.010635

International Journal of High Performance Computing and Networking, 2006 Vol.4 No.3/4, pp.122 - 136

Published online: 10 Aug 2006 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article