Title: Boosting prediction performance on imbalanced dataset

Authors: Masoumeh Zareapoor; Pourya Shamsolmoali

Addresses: Department of Computer Science, Jamia Hamdard University, New Delhi, 110062, India ' Department of Computer Science, Jamia Hamdard University, New Delhi, 110062, India

Abstract: Mining from imbalance data is an important problem in algorithmic and performance evaluation. When a dataset is imbalanced, the classification technique is not equal considering both the classes. It is obvious that the standard classifiers are not suitable to deal with imbalanced data, since they will likely classify all the instances into the majority class, which is the less important class. Additionally some of the performance measurement, like accuracy - which is known to be a biased metric in the case of imbalance data - does not have a very good performance when the data is imbalanced. In this paper, we tried to apply various techniques used commonly to handle class imbalance, before giving the data to the classifiers. But, the performance of the classifiers is found degrading because of the highly imbalanced nature of the datasets. Hence, we propose an integrated sampling technique with an ensemble of AdaBoost to improve the prediction performance. Meanwhile, through empirical, we show the more appropriate performance measures for mining imbalanced datasets.

Keywords: imbalanced dataset; classification; re-sampling; ensemble.

DOI: 10.1504/IJICT.2018.090556

International Journal of Information and Communication Technology, 2018 Vol.13 No.2, pp.186 - 195

Available online: 14 Mar 2018 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article