Title: Class imbalance and its effect on PCA preprocessing

Authors: T. Maruthi Padmaja; Bapi S. Raju; Rudra N. Hota; P. Radha Krishna

Addresses: Department of Computer Science and Engineering, K.L. University, Andhra Pradesh, India ' Department of Computers and Information Sciences, University of Hyderabad, Hyderabad, Andhra Pradesh, India ' Johann Wolfgang Goethe Universitat, Frankfurt am Main, Germany ' Infosys Labs, Infosys Limited, Manikonda, Andhra Pradesh, India

Abstract: The performance of classification models is prone to the class imbalance problem, which occurs when one class of data severely outnumbers the other class. Solutions were proposed both at data level and algorithm level to improve the model performance in this phenomenon. Among all, resampling solutions which preprocess the class information at data level, are successfully applied in solving many real-world class imbalance problems. However, principal component analysis (PCA) is one of the prominent preprocessing solution to improve the classifier performance. PCA comprises new subspace from original attributes by maximising the global variance. This work explored the effect of class imbalance on the reduced subspace generated by the principal component analysis (PCA) for two-class classification problem. Initially the effect of class imbalance over PCA preprocessing is studied on synthetic datasets. Obtained results are further validated over ten real-world datasets. This study reveals two major findings: 1) whenever the angular separation between the respective principal axes of majority and minority classes is large then the data imbalance clearly affects the minority class prediction accuracy as well as the minority class data reconstruction from the principal eigen vectors of the combined datasets; 2) balancing the class distribution is crucial to ameliorates the classifier's performance than PCA preprocessing.

Keywords: classification algorithms; principal component analysis; class imbalance problem; resampling techniques; PCA preprocessing.

DOI: 10.1504/IJKESDP.2014.064265

International Journal of Knowledge Engineering and Soft Data Paradigms, 2014 Vol.4 No.3, pp.272 - 294

Published online: 30 Aug 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article