Title: Comparison of feature selection and classification combinations for cancer classification using microarray data

Authors: Vijayan Vinaya, Nadeem Bulsara, Chetan J. Gadgil, Mugdha Gadgil

Addresses: Department of Bioinformatics, Dr. D.Y. Patil Biotechnology and Bioinformatics Institute, Akurdi, Pune 411044, India. ' Department of Bioinformatics, Dr. D.Y. Patil Biotechnology and Bioinformatics Institute, Akurdi, Pune 411044, India. ' Chemical Engineering and Process Development Division, National Chemical Laboratory, CSIR, Dr. Homi Bhabha Road, Pune 411008, India. ' Chemical Engineering and Process Development Division, National Chemical Laboratory, CSIR, Dr. Homi Bhabha Road, Pune 411008, India

Abstract: High throughput gene expression data can be used to identify biomarker profiles for classification. The accuracy of microarray based sample classification depends on the algorithm employed for selecting the features (genes) used for classification, and the classification algorithm. We have evaluated the performance of over 2000 combinations of feature selection and classification algorithms in classifying cancer datasets. One of these combinations (SVM for ranking genes + SMO) shows excellent classification accuracy using a small number of genes across three cancer datasets tested. Notably, classification using 15 selected genes yields 96% accuracy for a dataset obtained on an independent microarray platform.

Keywords: gene expression; cancer classification; feature selection; feature classification; microarray; SVM; support vector machines; unbiased cross validation; bioinformatics; biomarker profiles; classification accuracy.

DOI: 10.1504/IJBRA.2009.027515

International Journal of Bioinformatics Research and Applications, 2009 Vol.5 No.4, pp.417 - 431

Received: 23 Jun 2008
Accepted: 01 Oct 2008

Published online: 28 Jul 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article