Title: Structural Risk Minimisation based gene expression profiling analysis

Authors: Xue-wen Chen, Byron Gerlach, Dechang Chen, Zhenqiu Liu

Addresses: Bioinformatics and Computational Life Sciences Laboratory, Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA. ' Intel Corp., 1900 Prairie City Rd., Mail Stop: FM5-97, Folsom, CA 95630, USA. ' Division of Epidemiology and Biostatistics, Department of Preventive Medicine and Biometrics, Uniformed Services University of the Health Sciences, 4301 Jones Bridge Road, Bethesda, MD 20814, USA. ' Greenebaum Cancer Center, Department of Epidemiology and Preventive Medicine, University of Maryland Medicine, Baltimore, MD 21201, USA

Abstract: For microarray based cancer classification, feature selection is a common method for improving classifier generalisation. Most wrapper methods use cross validation methods to evaluate feature sets. For small sample problems like microarray, however, cross validation methods may overfit the data. In this paper, we propose a Structural Risk Minimisation (SRM) based method for gene selection in cancer classification. SRM principle allows for reducing the probable bound on generalisation error and thus avoids overfitting problems. The experimental results show that the proposed method produces significantly better performance than general wrapper methods that use cross validations.

Keywords: biomarker discovery; cancer classification; gene expression analysis; genetic algorithms; GA; machine learning; microarrays; multi-class feature selection; overfitting; structural risk minimisation; SRM; bioinformatics.

DOI: 10.1504/IJBRA.2007.013600

International Journal of Bioinformatics Research and Applications, 2007 Vol.3 No.2, pp.153 - 169

Published online: 09 May 2007 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article