Title: A hybrid approach to accelerate the classification accuracy of cervical cancer data with class imbalance problems

Authors: J. Samuel Manoharan; M. Braveen; G. Ganesan Subramanian

Addresses: Department of Electronics and Communication Engineering, Sir Isaac Newton College of Engineering & Technology, Nagapattinam, Tamil Nadu, India ' School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India ' Department of Electrical and Electronics Engineering, E.G.S. Pillay Engineering College, Nagapattinam, Tamil Nadu, India

Abstract: Cervical cancer is one of the most ubiquitous gynaecological disorders worldwide. While most of the research conducted over the past decade has focused on this, diagnosis widely suffers due to imbalanced class distribution problem over a time. Therefore, oversampling approach combined with unsupervised algorithm is proposed in this research work. Several existing oversampling techniques are used to address the class distribution problem, but noise among the data increases gradually and no new knowledge is added, which leads to overfitting. Hence, proposed technique combines Principal component Analysis (PCA) with Majority weighted Minority oversampling technique (MWMOTE). Also, proposed framework is tested by using two complete imbalanced data sets. For performance evaluation, evaluation metrics like recall, precision and F-measure are utilised. As a result, our proposed novel automated framework produces a better accuracy compared to existing approaches.

Keywords: oversampling; dimension reduction; classification; synthetic minority oversampling technique; SMOTE analysis; imbalanced data set.

DOI: 10.1504/IJDMB.2021.122865

International Journal of Data Mining and Bioinformatics, 2021 Vol.25 No.3/4, pp.234 - 261

Received: 27 Aug 2021
Accepted: 01 Apr 2022

Published online: 13 May 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article