Authors: Yifei Mao; Yuansheng Yang
Addresses: School of Computer Science and Technology, Dalian University of Technology, Dalian, China ' School of Computer Science and Technology, Dalian University of Technology, Dalian, China
Abstract: Identification of discriminative features from information-rich data with the goal of the clinical diagnosis is crucial in the field of biomedical science. Support Vector Machine Recursive Feature Elimination (SVM-RFE), an efficient feature selection method, has been widely applied in the domain and has achieved remarkable results. However, biological data are usually class-imbalanced and contain outliers, which largely affect the outcome of the feature ranking in SVM-RFE. This paper proposes a new feature selection method based on SVM-RFE and Effective Range (SVM-RFE-ER). The proposed method ranks the features by means of combining the SVM weight and the feature weight based on the effective ranges. Experiments on the simulated and real datasets have showed that SVM-RFE-ER is robust especially against outlier and imbalanced data, and it is effective in identifying biologically meaningful biomarkers for disease study.
Keywords: feature selection; imbalanced data; outlier data; effective range; SVM-RFE.
International Journal of Wireless and Mobile Computing, 2018 Vol.15 No.2, pp.105 - 112
Received: 05 Jul 2017
Accepted: 08 Mar 2018
Published online: 09 Oct 2018 *