Title: Significance analysis and improved discovery of disease-specific Differentially Co-expressed Gene Sets in microarray data

Authors: Haixia Li, R. Krishna Murthy Karuturi

Addresses: Computational and Mathematical Biology, Genome Institute of Singapore, A-STAR (Agency for Science, Technology and Research), 60 Biopolis Street, S138672, Republic of Singapore. ' Computational and Mathematical Biology, Genome Institute of Singapore, A-STAR (Agency for Science, Technology and Research), 60 Biopolis Street, S138672, Republic of Singapore

Abstract: Kostka and Spang proposed a statistic (KS-statistic) and an algorithm (KS algorithm) to elicit Differentially Co-expressed Gene Sets (DCEGSs) by minimising KS-statistic. We prove that the statistical distributions of KS-statistic under null hypothesis in variance un-normalised and normalised data settings are central and doubly non-central F-distributions, respectively. Based on this analysis, we propose two alternative but equivalent statistics whose null distributions are easier to evaluate. Further, we propose to improve the algorithm by objectively setting the search parameters via maximising the statistical significance of the resultant gene set and pre-filtering the genes by Friendly Neighbours (FNs) algorithm.

Keywords: gene expression; microarray analysis; differential co-expression; FNs; friendly neighbours algorithm; statistical significance; disease-specific deregulated pathways; gene sets; bioinformatics.

DOI: 10.1504/IJDMB.2010.037544

International Journal of Data Mining and Bioinformatics, 2010 Vol.4 No.6, pp.617 - 638

Published online: 16 Dec 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article