Title: A survey on effects of class imbalance in data pre-processing stage of classification problem

Authors: Nitin Malave; Anant V. Nimkar

Addresses: Department of Computer Engineering, Sardar Patel Institute of Technology, Mumbai, India ' Department of Computer Engineering, Sardar Patel Institute of Technology, Mumbai, India

Abstract: Classifier learning with datasets suffering from imbalance class distribution is a challenging task and it hinders the performance of machine learning algorithms. This imbalance occurs when a particular class is highly outnumbered than that of another class. Such kind of data distribution in the real world applications caught the attention of many researchers. This paper presents the review of various state of the art sampling techniques and ensemble techniques to resolve class imbalance. This paper also investigates the other factors such as threshold of distribution, inter or within class imbalance, etc., that make class imbalance a more complex issue. Comparisons of various approaches viz. data sampling, cost sensitive methods, bagging, boosting which alleviate the class imbalance problem are investigated in detail for their effects on class imbalance problem. Different parameters have been reviewed for measuring and evaluating the performance of the model. Accuracy is majorly used as evaluation parameter in machine learning problems, but from reviews it is found that there are different parameters such as precision, recall and AU-ROC which provide statistical measures for evaluating the model. The paper gives research directions in the domain of class imbalance problems.

Keywords: machine learning; class imbalance; rare event detection; classification; re-sampling techniques.

DOI: 10.1504/IJCSYSE.2020.111203

International Journal of Computational Systems Engineering, 2020 Vol.6 No.2, pp.63 - 75

Received: 01 Jun 2019
Accepted: 21 Apr 2020

Published online: 26 Oct 2020 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article