Title: Semantic similarity based feature extraction from microarray expression data

Authors: Young-Rae Cho, Aidong Zhang, Xian Xu

Addresses: Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260, USA. ' Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260, USA. ' Microsoft Corporation, Redmond, WA 98052, USA

Abstract: Previous studies have proven that it is feasible to build sample classifiers using gene expression profiles. To build an effective sample classifier, dimension reduction process is necessary since classic pattern recognition algorithms do not work well in high dimensional space. In this paper, we present a novel feature extraction algorithm by integrating microarray expression data with Gene Ontology (GO). Applying semantic similarity measures, we identify the groups of genes, called virtual genes, which potentially interact with each other for a biological function. The correlation in expressions of virtual genes is used to classify samples. For colon cancer data, this approach significantly improved the classification accuracy by more than 10%.

Keywords: feature extraction; microarray expression data; semantic similarity; bioinformatics; gene ontology; virtual genes; colon cancer data; sample classifiers; gene expression profiles; classification accuracy.

DOI: 10.1504/IJDMB.2009.026705

International Journal of Data Mining and Bioinformatics, 2009 Vol.3 No.3, pp.333 - 345

Published online: 23 Jun 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article