Title: Addressing statistical significance of fault injection: empirical studies of the soft error susceptibility

Authors: Qiang Guan; Nathan DeBardeleben; Sean Blanchard; Song Fu

Addresses: Ultrascale Systems Research Center, Los Alamos National Laboratory, Bikini Atoll Rd., SM 30, Los Alamos, NM 87545, USA ' Ultrascale Systems Research Center, Los Alamos National Laboratory, Bikini Atoll Rd., SM 30, Los Alamos, NM 87545, USA ' Ultrascale Systems Research Center, Los Alamos National Laboratory, Bikini Atoll Rd., SM 30, Los Alamos, NM 87545, USA ' Department of Computer Science and Engineering, University of North Texas, USA

Abstract: Soft errors are becoming an important issue in computing systems. Near-threshold voltage (NTV), reduced circuit sizes, high performance computing (HPC), and high altitude computing all present interesting challenges in this area. Much of the existing literature has focused on hardware techniques to mitigate and measure soft errors at the hardware level. Instead, in this paper, we explore the soft error susceptibility of three common sorting algorithms at the software layer. We focus on the comparison operator and use our software fault injection tool to place faults with fine precision during the execution of these algorithms. We explore how the algorithm susceptibilities vary based on input and bit position and relate these faults back to the source code to study how algorithmic decisions impact the reliability of the codes. Finally, we look at the question of the number of fault injections required for statistical significance. Using standard practice equations used in hardware fault injection experiments, we calculate the number of injections that should be required to achieve confidence in our results. Then we show, empirically, that more fault injections are required before we gain confidence in our experiments.

Keywords: soft error; fault injection; resilience; vulnerability; sorting algorithms.

DOI: 10.1504/IJHPCN.2017.086547

International Journal of High Performance Computing and Networking, 2017 Vol.10 No.4/5, pp.436 - 452

Received: 12 Oct 2015
Accepted: 27 May 2016

Published online: 12 Sep 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article