Title: A guided oversampling technique to improve the prediction of software fault-proneness for imbalanced data

Authors: Raed Shatnawi; Ziad Al-Sharif

Addresses: Software Engineering Department, Jordan University of Science and Technology, Irbid, 22110, Jordan ' Software Engineering Department, Jordan University of Science and Technology, Irbid, 22110, Jordan

Abstract: Fault-proneness is one of the most tackled quality factors in the field of software quality. Predicting the probability of the faulty classes is necessary information to guide developers in their endeavour to improve the software quality and to reduce the costs of testing and maintenance. The performance of the fault prediction models suffers greatly from the imbalance of fault distribution, i.e., the majority of modules are not faulty whereas the minority are only faulty. The imbalanced distribution of faults affects the efficiency of prediction models greatly. In this paper, we discuss many oversampling techniques that are used to improve the performance of prediction models. We propose to guide the oversampling process using the fault content (i.e., the number of faults in a module). This study is conducted on a large object-oriented system - Eclipse. The proposed oversampling is tested on ten classifiers. The results of this work shows that using fault content in sampling has better prediction performance than other traditional oversampling techniques. The decision trees and nearest neighbours have shown outstanding performance whereas other classifiers have shown acceptable performance.

Keywords: fault proneness; imbalanced data; CK metrics; data mining; oversampling; fault prediction; software faults; software errors; error prediction; software quality; fault distribution; prediction models; object-oriented metrics; fault content; sampling; decision trees; nearest neighbour.

DOI: 10.1504/IJKEDM.2012.051241

International Journal of Knowledge Engineering and Data Mining, 2012 Vol.2 No.2/3, pp.200 - 214

Published online: 13 Sep 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article