You can view the full text of this article for free using the link below.

Title: Performance evaluation of oversampling algorithm: MAHAKIL using ensemble classifiers

Authors: C. Arun; C. Lakshmi

Addresses: Department of Computational Intelligence, School of Computing, SRMIST, Chennai, Tamil Nadu, India ' School of Computing, SRM Institute of Science and Technology, India

Abstract: Class imbalance is a known problem that exists in real-world applications, which consists of disparity in the existence of sample counts of different classes that results in biased performance. The class imbalance issue has been catered by many sampling techniques which may either fall into an oversampling approach that solves issues to a greater extent or under sampling. MAHAKIL is a diversity-based oversampling approach influenced by the theory of inheritance, in which minority samples are synthesised in view of balancing the class using Mahalanobis distance measure. In this study the performance of MAHAKIL algorithm has been tested using various ensemble classifiers which are proved to be effective due to its multi hypothesis learning approach and better performance. The results of the experiment conducted on 20 imbalanced software defect prediction datasets using six different ensemble approaches showcase XGBoost provides better performance and reduced false alarm rate compared to other models.

Keywords: class imbalance; software fault prediction; synthetic samples; over sampling techniques; MAHAKIL; false alarm rate; evolutionary algorithm; ensemble; inheritance.

DOI: 10.1504/IJBIDM.2023.127293

International Journal of Business Intelligence and Data Mining, 2023 Vol.22 No.1/2, pp.1 - 15

Received: 26 Aug 2021
Accepted: 17 Sep 2021

Published online: 30 Nov 2022 *

Full-text access for editors Full-text access for subscribers Free access Comment on this article