Title: An approach to fault tolerance in the cloud using the checkpointing technique

Authors: Ghalem Belalem; Said Limam

Addresses: Department of Computer Science, Faculty of Sciences, University of Oran, B.P. 1524, EL M'Naouer, Oran, 31000, Algeria ' Department of Computer Science, Faculty of Sciences, University of Oran, B.P. 1524, EL M'Naouer, Oran, 31000, Algeria

Abstract: Reliability refers to the probability that a system will offer failure-free service for a specified period of time within the bounds of a specified environment. For the cloud, reliability is broadly a function of the reliability of four individual components: 1) the hardware and software facilities offered by providers; 2) the provider's personnel; 3) connectivity to the subscribed services; 4) the subscriber's personnel. It is too expensive to provide redundant alternative components for all the cloud components. To reduce the cost and to develop highly reliable cloud within the limited budget, we proposed in this paper a fault tolerant architecture to cloud computing that uses a dynamic and adaptive checkpoint mechanism to provide a reliable cloud computing system.

Keywords: fault tolerance; cloud computing; virtualisation; checkpointing; resource management; reliability.

DOI: 10.1504/IJCNDS.2013.056221

International Journal of Communication Networks and Distributed Systems, 2013 Vol.11 No.3, pp.236 - 249

Received: 08 May 2021
Accepted: 12 May 2021

Published online: 13 Aug 2013 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article