Title: Feature selection for genomic data sets through feature clustering

Authors: Fengbin Zheng, Xiajiong Shen, Zhengye Fu, Shanshan Zheng, Guangrong Li

Addresses: College of Computer and Information Engineering, Henan University, Kaifeng, Henan 475004, China. ' College of Computer and Information Engineering, Henan University, Kaifeng, Henan 475004, China. ' College of Computer and Information Engineering, Henan University, Kaifeng, Henan 475004, China. ' College of Computer and Information Engineering, Henan University, Kaifeng, Henan 475004, China. ' College of Accounting, Hunan University, Hunan 475004, China

Abstract: A subset selected by a supervised feature selection method may not be a good one for unsupervised learning and vice versa. We propose a novel Feature Selection algorithm through Feature Clustering, FSFC. FSFC does not need the class label information in the data set and is suitable for both supervised learning and unsupervised learning. We test FSFC on some biological data sets for both clustering and classification analysis and the results indicates that FSFC algorithm can significantly reduce the original data sets without scarifying the quality of clustering and classification.

Keywords: feature selection; feature clustering; genomic data; bioinformatics; feature similarity; supervised learning; unsupervised learning; classification.

DOI: 10.1504/IJDMB.2010.032152

International Journal of Data Mining and Bioinformatics, 2010 Vol.4 No.2, pp.228 - 240

Published online: 11 Mar 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article