Title: A three-stage framework for gene expression data analysis by L1-norm support vector regression

Authors: Hyunsoo Kim, Jeff X. Zhou, Herbert C. Morse III, Haesun Park

Addresses: Department of Computer Science, University of Minnesota, Twin Cities, 200 Union Street S.E., Minneapolis, MN 55455, USA. ' Laboratory of Immunopathology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, 5640 Fishers Lane, Rockville, MD 20852, USA. ' Laboratory of Immunopathology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, 5640 Fishers Lane, Rockville, MD 20852, USA. ' Department of Computer Science, University of Minnesota, Twin Cities, 200 Union Street S.E., Minneapolis, MN 55455, USA

Abstract: The identification of discriminative genes for categorical phenotypes in microarray gene expression data analysis has been extensively studied, especially for disease diagnosis. In recent biological experiments, continuous phenotypes have also been dealt with. For example, the extent of programmed cell death (apoptosis) can be measured by the level of caspase 3 enzyme. Thus, an effective gene selection method for continuous phenotypes is desirable. In this paper, we describe a three-stage framework for gene expression data analysis based on L1-norm support vector regression (L1-SVR). The first stage ranks genes by recursive multiple feature elimination based on L1-SVR. In the second stage, the minimal genes are determined by a kernel regression, which yields the lowest ten-fold cross-validation error. In the last stage, the final non-linear regression model is built with the minimal genes and optimal parameters found by leave-one-out cross-validation. The experimental results show a significant improvement over the current state-of-the-art approach, i.e., the two-stage process, which consists of the gene selection based on L1-SVR and the third stage of the proposed method.

Keywords: gene expression data analysis; apoptosis; support vector regression; recursive multiple feature elimination; gene selection; continuous phenotype; bioinformatics; discriminative genes; gene expression microarray analysis.

DOI: 10.1504/IJBRA.2005.006902

International Journal of Bioinformatics Research and Applications, 2005 Vol.1 No.1, pp.51 - 62

Published online: 21 Apr 2005 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article