OS-level hang detection in complex software systems Online publication date: Sun, 04-Sep-2011
by Antonio Bovenzi, Marcello Cinque, Domenico Cotroneo, Roberto Natella, Gabriella Carrozza
International Journal of Critical Computer-Based Systems (IJCCBS), Vol. 2, No. 3/4, 2011
Abstract: Many critical services are nowadays provided by large and complex software systems. However, the increasing complexity introduces several sources of non-determinism, which may lead to hang failures: the system appears to be running, but part of its services is perceived as unresponsive. Online monitoring is the only way to detect and to promptly react to such failures. However, when dealing with off-the-shelf-based systems, online detection can be tricky since instrumentation and log data collection may not be feasible in practice. In this paper, a detection framework to cope with software hangs is proposed. The framework enables the non-intrusive monitoring of complex systems, based on multiple sources of data gathered at the operating system (OS) level. Collected data are then combined to reveal hang failures. The framework is evaluated through a fault injection campaign on two complex systems from the air traffic management (ATM) domain. Results show that the combination of several monitors at the OS level is effective to detect hang failures in terms of coverage and false positives and with a negligible impact on performance.
Online publication date: Sun, 04-Sep-2011
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Critical Computer-Based Systems (IJCCBS):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email email@example.com