Title: Exploiting partitioned synchrony to implement accurate failure detectors

Authors: Raimundo José de Araújo Macêdo; Sérgio Gorender

Addresses: Distributed System Laboratory (LaSiD), Computer Science Department, Federal University of Bahia, Campus de Ondina, 40170-110, Salvador, Brazil. ' Distributed System Laboratory (LaSiD), Computer Science Department, Federal University of Bahia, Campus de Ondina, 40170-110, Salvador, Brazil

Abstract: We exploit the concept of partitioned synchrony to show that it is possible to implement accurate failure detectors in a non-synchronous distributed system. To realise that, we introduce the partitioned synchronous system (Spa) that is weaker than the conventional synchronous system. Based on some properties we introduce (such as strong partitioned synchrony) that must be valid in Spa and a trivially implementable timeliness oracle, we show how to implement a perfect failure detector P in Spa. Moreover, we show that even if strong partitioned synchrony is not valid, we are still able to take advantage of the existing synchronous partitions for improving the robustness of applications, by introducing a partially perfect (and accurate) failure detector named xP. We also discuss how applications can benefit from these failure detectors and present some related experimental data. The necessary properties and algorithms for implementing P and xP are presented in the paper, as well as the related correctness proofs.

Keywords: fault tolerance; modelling; perfect failure detectors; hybrid distributed systems; partitioned synchrony; partitioned synchronous systems.

DOI: 10.1504/IJCCBS.2012.050303

International Journal of Critical Computer-Based Systems, 2012 Vol.3 No.3, pp.168 - 186

Published online: 16 Aug 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article