Title: A guided oversampling technique to improve the prediction of software fault-proneness for imbalanced data
Authors: Raed Shatnawi; Ziad Al-Sharif
Addresses: Software Engineering Department, Jordan University of Science and Technology, Irbid, 22110, Jordan ' Software Engineering Department, Jordan University of Science and Technology, Irbid, 22110, Jordan
Abstract: Fault-proneness is one of the most tackled quality factors in the field of software quality. Predicting the probability of the faulty classes is necessary information to guide developers in their endeavour to improve the software quality and to reduce the costs of testing and maintenance. The performance of the fault prediction models suffers greatly from the imbalance of fault distribution, i.e., the majority of modules are not faulty whereas the minority are only faulty. The imbalanced distribution of faults affects the efficiency of prediction models greatly. In this paper, we discuss many oversampling techniques that are used to improve the performance of prediction models. We propose to guide the oversampling process using the fault content (i.e., the number of faults in a module). This study is conducted on a large object-oriented system - Eclipse. The proposed oversampling is tested on ten classifiers. The results of this work shows that using fault content in sampling has better prediction performance than other traditional oversampling techniques. The decision trees and nearest neighbours have shown outstanding performance whereas other classifiers have shown acceptable performance.
Keywords: fault proneness; imbalanced data; CK metrics; data mining; oversampling; fault prediction; software faults; software errors; error prediction; software quality; fault distribution; prediction models; object-oriented metrics; fault content; sampling; decision trees; nearest neighbour.
International Journal of Knowledge Engineering and Data Mining, 2012 Vol.2 No.2/3, pp.200 - 214
Received: 08 May 2021
Accepted: 12 May 2021
Published online: 29 Dec 2012 *