Title: A greedy algorithm for gene selection based on SVM and correlation

Authors: Mingjun Song, Sanguthevar Rajasekaran

Addresses: Department of Computer Science and Engineering, University of Connecticut, Storrs 06269, CT, USA. ' Department of Computer Science and Engineering, University of Connecticut, Storrs 06269, CT, USA

Abstract: Microarrays serve scientists as a powerful and efficient tool to observe thousands of genes and analyse their activeness in normal or cancerous tissues. In general, microarrays are used to measure the expression levels of thounsands of genes in a cell mixture. Gene expression data obtained from microarrays can be used for various applications. One such application is that of gene selection. Gene selection is very similar to the feature selection problem addressed in the machine-learning area. In a nutshell, gene selection is the problem of identifying a minimum set of genes that are responsible for certain events (for example the presence of cancer). Informative gene selection is an important problem arising in the analysis of microarray data. In this paper, we present a novel algorithm for gene selection that combines Support Vector Machines (SVMs) with gene correlations. Experiments show that the new algorithm, called GCI-SVM, obtains a higher classification accuracy using a smaller number of selected genes than the well-known algorithms in the literature.

Keywords: gene selection; feature selection; SVM; support vector machines; cancer classification; microarrays; bioinformatics; greedy algorithms; gene correlation; gene expression; classification accuracy.

DOI: 10.1504/IJBRA.2010.034077

International Journal of Bioinformatics Research and Applications, 2010 Vol.6 No.3, pp.296 - 307

Accepted: 03 Nov 2009
Published online: 07 Jul 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article