Title: Transparent parallel checkpointing and migration in clusters and ClusterGrids

Authors: Jozsef Kovacs

Addresses: MTA SZTAKI, Parallel and Distributed Systems Laboratory, H1518 Budapest, P.O. Box 63, Hungary

Abstract: This paper introduces a novel approach in parallel checkpointing aimed at supporting fault-tolerance and migration among clusters of a ClusterGrid environment with various middleware components. Based on an architectural analysis, compatibility and integrity requirements are identified and corresponding conditions are established. Some of the available checkpointing systems are checked against the conditions in order to examine their conformity. Finally, a novel checkpointing approach is defined and the Parallel Grid Runtime and Application Development Environment (P-GRADE) Grid Programming Tool is adapted.

Keywords: message passing; parallel checkpointing; migration; clusters; grid computing; clustergrid; pvm; Condor; graphical programming environment; fault tolerance; middleware components.

DOI: 10.1504/IJCSE.2009.027379

International Journal of Computational Science and Engineering, 2009 Vol.4 No.3, pp.171 - 181

Published online: 21 Jul 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article