Title: Broader dynamic load balancing for hybrid/multi-level parallel programming models

Authors: Ahmed S. Mohamed

Addresses: Department of Electrical and Computer Engineering, The George Washington University, Washington, DC 20052, USA

Abstract: Recently, Hybrid/Multi-Level Parallel Programming Models have begun gaining lots of momentum basically because they have proven to provide better scalability, speedup and utilisation than any single parallel programming model alone. In such models, load balancing should not only mean balancing the computational loads (as it has always been perceived), but also balancing I/O imbalance as well as synchronisation imbalance. In this paper, we propose a broader generic application/language/model independent multi-agent framework for dynamic load balancing. It takes most of the load-balancing burden away from programmers. It is not a library but a runtime support system that is not hardwired to the parallel applications. The framework is intended to handle varying levels of load changes in computations, I/O and/or synchronisation throughout the application run and it is an open architecture that currently supports four multi-level parallel programming models. It has a clean interface to the application, runs in parallel and provides additional functionality such as determination of when to load balance, and provide interface to end users. The proposed framework has been deployed in four hybrid/multi-level parallel programming models and its capabilities of issuing corrective actions against emerging imbalances were tested in the context of an adaptive mesh refinement application. Experimental results show that the framework is effective in monitoring, tuning and rebalancing emerging computational, I/O and synchronisation sources of load imbalance.

Keywords: MPI; OpenMP; SHMEM; MLP; rebalancing; synchronisation; parallel programming; load balancing.

DOI: 10.1504/IJHPCN.2005.008034

International Journal of High Performance Computing and Networking, 2005 Vol.3 No.2/3, pp.171 - 187

Published online: 10 Nov 2005 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article