Title: A hybrid fault tolerance framework for SaaS services based on hidden Markov model

Authors: Feng Ye; Qian Huang; Zhijian Wang; Ling Li

Addresses: College of Computer and Information, Hohai University, Nanjing, China; Nanjing Longyuan Micro-Electronics Company, Nanjing, China ' College of Computer and Information, Hohai University, Nanjing, China; Nanjing Huiying Electronics Technology Corporation, Nanjing, China ' College of Computer and Information, Hohai University, Nanjing, China ' College of Computer and Information, Hohai University, Nanjing, China

Abstract: With the booming of cloud computing, more and more applications adopt cloud services to implement their critical business. However, failures causing either service downtime or producing invalid results in such applications may range from a mere inconvenience to significant monetary penalties or even loss of human lives. In critical systems, making the cloud services highly dependable is one of the main challenges. Existing researches show that using fault injection for experimental assessment of fault tolerance architecture for cloud services is still an open problem because of the complexity and diversity of failures in cloud environment. Therefore, we propose a hybrid fault tolerance framework which utilises replication and design diversity techniques for SaaS service. In order to verify the effectiveness of the fault tolerance framework in various pragmatic failure scenarios, a mixed fault simulator based on urn and ball model in hidden Markov model is introduced. A series of experiments are carried out for evaluating the reliability of the SaaS service, including single service without replication, single service with retry or reboot, and a service with spatial replication. The results show that the mixed fault simulator is flexible for simulating various faults in cloud environment, and both temporal and spatial redundancy have better effect on the availability and reliability improvement of the SaaS service.

Keywords: hidden Markov model; SaaS; fault tolerance; cloud services.

DOI: 10.1504/IJRS.2019.097022

International Journal of Reliability and Safety, 2019 Vol.13 No.1/2, pp.138 - 150

Received: 14 Sep 2017
Accepted: 13 Jun 2018

Published online: 14 Dec 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article