Title: A novel multi-stage feature selection method for microarray expression data analysis

Authors: Wei Du; Ying Sun; Yan Wang; Zhongbo Cao; Chen Zhang; Yanchun Liang

Addresses: College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; College of Chemistry, Jilin University, Changchun 130012, China ' College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China ' College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; College of Mathematics, Jilin University, Changchun 130012, China ' College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China ' College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China ' College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China

Abstract: With the development of genome research, finding method to classify cancer and detect biomarkers efficiently has become a challenging problem. In this paper, a novel multi-stage method for feature selection is proposed which considers all kinds of genes in the original gene set. The method eliminates the irrelevant, noisy and redundant genes and selects a subset of relevant genes at different stages. The proposed method is examined on microarray datasets of Leukemia, Prostate, Colon, Breast, Nervous and DLBCL by different classifiers and the best accuracies of the method in these datasets are 100%, 98.04%, 100%, 89.74%, 100% and 98.28%, respectively.

Keywords: feature selection; microarray expression data analysis; cancer classification; expression correlation analysis; disease biomarker identification; improved normalised signal to noise ratio; support vector clustering; bioinformatics; biomarkers; leukemia.

DOI: 10.1504/IJDMB.2013.050977

International Journal of Data Mining and Bioinformatics, 2013 Vol.7 No.1, pp.58 - 77

Received: 26 Aug 2010
Accepted: 10 Feb 2011

Published online: 20 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article