Title: Analyse resilience risks in microservice architecture systems with causality search and inference algorithms

Authors: Kanglin Yin; Qingfeng Du; Juan Qiu

Addresses: School of Software Engineering, Tongji University, 4800 Cao'an Highway, Shanghai, China ' School of Software Engineering, Tongji University, 4800 Cao'an Highway, Shanghai, China ' School of Software Engineering, Tongji University, 4800 Cao'an Highway, Shanghai, China

Abstract: The microservice architecture has already become the mainstream architecture pattern of web service applications in recent years. However, compared with traditional software architectures, the microservice architecture has a more sophisticated deployment structure, which makes it have to face more potential risks with greater diversity of fault symptoms. Microservice practitioners started to use the word 'resilience' to describe the capability of coping with different unexpected conditions. How to judge whether a system environment disruption is a risk of microservice resilience, and how to analyse resilience risks before the system is released, are the research questions in microservice development. As the practice of chaos engineering has solved the problem of resilience risk identification, this paper focuses on how to analyse identified resilience risks in microservice architecture systems, and a resilience risk analysis method is proposed. Based on performance monitoring data collected during chaos experiments, the analysis method uses the causality search algorithm to build causality graphs of performance indicators, and generates causality chains to system operators by the causality inference algorithm. The effectiveness of the proposed approach is proved by conducting a case study on a microservice architecture system.

Keywords: microservice; resilience; software risk analysis; causality search and inference.

DOI: 10.1504/IJWGS.2020.107921

International Journal of Web and Grid Services, 2020 Vol.16 No.2, pp.147 - 171

Received: 03 Oct 2019
Accepted: 12 Feb 2020

Published online: 30 Jun 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article