Title: Software fault proneness prediction: a comparative study between bagging, boosting, and stacking ensemble and base learner methods

Authors: Mohammed Akour; Izzat Alsmadi; Iyad Alazzam

Addresses: Department of Computer Information Systems, Yarmouk University, P.O. Box 566, Irbid 21163, Jordan ' University of New Haven, West Haven, CT 06516, USA ' Department of Computer Information Systems, Yarmouk University, P.O. Box 566, Irbid 21163, Jordan

Abstract: Modules with defects might be the prime reason for decreasing the software quality and increasing the cost of maintenance. Therefore, the prediction of faulty modules of systems under test at early stages contributes to the overall quality of software products. In this research three symmetric ensemble methods: bagging, boosting and stacking are used to predict faulty modules based on evaluating the performance of 11 base learners. The results reveal that the defect prediction performance of the base learner classifier and ensemble learner classifiers is the same for naïve Bayes, Bayes net, PART, random forest, IB1, VFI, decision table, and NB tree base learners, the case was different for boosted SMO, bagged J48 and boosted and bagged random tree. In addition the results showed that the random forest classifier is one of the most significant classifiers that should be stacked with other classifiers to gain the better fault prediction.

Keywords: software defect prediction; bagging; boosting; stacking; data mining; software defects; software faults; software testing; software development; software quality; ensemble classifiers; base learner classifier; random forest.

DOI: 10.1504/IJDATS.2017.083058

International Journal of Data Analysis Techniques and Strategies, 2017 Vol.9 No.1, pp.1 - 16

Received: 27 Apr 2015
Accepted: 23 Sep 2015

Published online: 20 Mar 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article