Title: Collective operations for wide-area message-passing systems using adaptive spanning trees

Authors: Hideo Saito, Kenjiro Taura, Takashi Chikayama

Addresses: Department of Information and Communication Engineering, University of Tokyo, Tokyo, Japan. ' Department of Information and Communication Engineering, University of Tokyo, Tokyo, Japan. ' Department of Frontier Informatics and Department of Information and Communication Engineering, University of Tokyo, Tokyo, Japan

Abstract: We propose a method for wide-area message-passing systems to perform broadcasts and reductions efficiently using latency and bandwidth-aware spanning trees constructed at run-time. These trees are updated when processes join or leave a computation, allowing effective execution to continue. We have implemented our proposal on the Phoenix Message-Passing Library and performed experiments using 160 processors distributed across four clusters. Compared to a static Grid-aware implementation, the latency of our broadcast was within a factor of two, and the bandwidth was 82%. When some processes joined or left a computation, our broadcast temporarily performed poorly, but completed successfully even during that time.

Keywords: wide-area networks; WANs; message passing; collective operations; broadcast; reduction; adaptive spanning trees; latency.

DOI: 10.1504/IJHPCN.2008.020862

International Journal of High Performance Computing and Networking, 2008 Vol.5 No.3, pp.179 - 188

Published online: 19 Oct 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article