Title: Don't lose the point, check it: Is your cloud application using the right strategy?

Authors: Demis Gomes; Glauco Gonçalves; Patricia Endo; Moisés Rodrigues; Judith Kelner; Djamel Sadok; Calin Curescu

Addresses: Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Ericsson Research Group, Kista, Sweden

Abstract: Users pay for running their applications on cloud infrastructure, and in return they expect high availability, and minimal data loss in case of failure. From a cloud provider perspective, any hardware or software failure must be detected and recovered as quickly as possible to maintain users' trust and avoid financial losses. From a user's perspective, failures must be transparent and should not impact application performance. In order to recover a failed application, cloud providers must perform checkpoints, and periodically save application data, which can then be recovered following a failover. Currently, a checkpoint service can be implemented in many ways, each presenting different performance results. The main research question to be answered is: what is the best checkpoint strategy to use given some users' requirements? In this paper, we performed experiments with different checkpoint service strategies to understand how these are affected by the computing resources. We also provide a discussion about the relationship between service availability and the checkpoint service.

Keywords: checkpoint; failover; performance evaluation; SAF standard.

DOI: 10.1504/IJGUC.2019.102735

International Journal of Grid and Utility Computing, 2019 Vol.10 No.6, pp.681 - 693

Received: 21 Apr 2018
Accepted: 19 Nov 2018

Published online: 09 Aug 2019 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article