Title: Two-stage gene selection for support vector machine classification of microarray data

Authors: Xiao-Lei Xia, Kang Li, George W. Irwin

Addresses: School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Ashby Building, Stranmillis Road, Belfast BT9 5AH, UK. ' School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Ashby Building, Stranmillis Road, Belfast BT9 5AH, UK. ' School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Ashby Building, Stranmillis Road, Belfast BT9 5AH, UK

Abstract: This paper proposes a new stable gene selection method for support vector machines (SVM) classification of microarray data, aiming to improve the classification accuracy. A two-stage algorithm is used to select genes, leading to the construction of a compact multivariate linear regression model, which contains only genes less than the number of experiments as well as a weight vector for each gene index. An SVM then learns the microarray data based on this linear regression model. The experimental results, from two well-known microarray datasets, show that SVMs with two-stage gene selection maintains a consistently high accuracy with a small number of genes. It is also shown that the proposed method outperforms the two other typical gene selection methods – baseline method and significance analysis of microarrays in terms of accuracy.

Keywords: support vector machines; SVM; two-stage linear regression; gene selection; baseline method; significance analysis; microarrays; microarray data classification.

DOI: 10.1504/IJMIC.2009.029029

International Journal of Modelling, Identification and Control, 2009 Vol.8 No.2, pp.164 - 171

Published online: 27 Oct 2009 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article