Title: Scoring and summarising gene product clusters using the Gene Ontology

Authors: Spiridon C. Denaxas, Christos Tjortjis

Addresses: School of Computer Science, University of Manchester, P.O. Box 88, Manchester M60 1QD, UK. ' School of Computer Science, University of Manchester, P.O. Box 88, Manchester M60 1QD, UK

Abstract: We propose an approach for quantifying the biological relatedness between gene products, based on their properties, and measure their similarities using exclusively statistical NLP techniques and Gene Ontology (GO) annotations. We also present a novel similarity figure of merit, based on the vector space model, which assesses gene expression analysis results and scores gene product clusters| biological coherency, making sole use of their annotation terms and textual descriptions. We define query profiles which rapidly detect a gene product cluster|s dominant biological properties. Experimental results validate our approach, and illustrate a strong correlation between our coherency score and gene expression patterns.

Keywords: biomedical text; data mining; bioinformatics; gene ontology; vector space model; statistical NLP; nonlinear programming; gene expression analysis; gene product clusters; biological properties; annotation terms.

DOI: 10.1504/IJDMB.2008.020523

International Journal of Data Mining and Bioinformatics, 2008 Vol.2 No.3, pp.216 - 235

Published online: 29 Sep 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article