A study of data pre-processing techniques for imbalanced biomedical data classification Online publication date: Thu, 20-Aug-2020
by Shigang Liu; Jun Zhang; Yang Xiang; Wanlei Zhou; Dongxi Xiang
International Journal of Bioinformatics Research and Applications (IJBRA), Vol. 16, No. 3, 2020
Abstract: Biomedical data are widely accepted in developing prediction models for identifying a specific tumour, drug discovery and human cancers detection. However, previous studies usually focused on different classifiers, and overlook the class imbalance problem in real-world biomedical datasets. This paper mainly focuses on reviewing and evaluating some popular and recently developed resampling and feature selection (FS) methods for class imbalance learning with data distribution being considered. Experimental results show that: 1) resampling and FS techniques exhibit better performance using support vector machine (SVM) classifier; 2) techniques such as random undersampling and FS perform better than other data pre-processing methods with T location-scale distribution when using SVM and K-nearest neighbours (KNN) classifiers. Random oversampling outperforms other methods on negative binomial distribution using Random Forest with lower level of imbalance ratio; 3) FS outperforms other data pre-processing methods in most cases, thus, FS with SVM classifier is the best choice for imbalanced biomedical data learning.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Bioinformatics Research and Applications (IJBRA):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com