Predicting robustness against transient faults of MPI based programs Online publication date: Thu, 28-Apr-2016
by Joao Gramacho; Alvaro Wong; Dolores Rexachs; Emilio Luque
International Journal of Computational Science and Engineering (IJCSE), Vol. 12, No. 2/3, 2016
Abstract: The evaluation of a program's behaviour in the presence of transient faults is often a very time consuming work. In order to achieve significant data, thousands of executions are required and each execution will have the significant overhead of the fault injection environment. A previously published methodology reduced significantly the time needed to evaluate the robustness of a program execution by exhaustively analysing its execution trace instead of using fault injection. In this paper we present a further improvement in the evaluation time of parallel programs robustness against transient faults by combining this methodology with PAS2P - a method that strives to describe an application based on its message-passing activity. This combination allowed us to predict the robustness of larger parallel programs, reducing in some cases by more than 20 times the time needed to calculate the robustness while obtaining a robustness prediction error of less than 4%.
Online publication date: Thu, 28-Apr-2016
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Computational Science and Engineering (IJCSE):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email firstname.lastname@example.org