Authors: Masoumeh Zareapoor; Pourya Shamsolmoali
Addresses: Department of Computer Science, Jamia Hamdard University, New Delhi, 110062, India ' Department of Computer Science, Jamia Hamdard University, New Delhi, 110062, India
Abstract: Mining from imbalance data is an important problem in algorithmic and performance evaluation. When a dataset is imbalanced, the classification technique is not equal considering both the classes. It is obvious that the standard classifiers are not suitable to deal with imbalanced data, since they will likely classify all the instances into the majority class, which is the less important class. Additionally some of the performance measurement, like accuracy - which is known to be a biased metric in the case of imbalance data - does not have a very good performance when the data is imbalanced. In this paper, we tried to apply various techniques used commonly to handle class imbalance, before giving the data to the classifiers. But, the performance of the classifiers is found degrading because of the highly imbalanced nature of the datasets. Hence, we propose an integrated sampling technique with an ensemble of AdaBoost to improve the prediction performance. Meanwhile, through empirical, we show the more appropriate performance measures for mining imbalanced datasets.
Keywords: imbalanced dataset; classification; re-sampling; ensemble.
International Journal of Information and Communication Technology, 2018 Vol.13 No.2, pp.186 - 195
Received: 17 Feb 2015
Accepted: 16 Jul 2015
Published online: 14 Mar 2018 *