Title: Template edge similarity graph clustering for mining multiple gene expression datasets

Authors: Saeed Salem

Addresses: Department of Computer Science, North Dakota State University, Fargo, ND 58108, USA

Abstract: High throughput technologies have enabled the acquisition of large amounts of genomic data, including gene expression and RNA sequencing data for multiple species under various biological and environmental conditions. Recently, researchers have proposed methods for mining biological modules from gene co-expression networks. Biological inference from a single expression dataset suffers from spurious co-expression. Integrating multiple gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on single gene expression data. We propose an integrative mining algorithm that constructs a template edge similarity graph whose nodes are the co-expression edges and a weighted edge connecting the two nodes corresponds to the structural similarity of the two edges across the co-expression graphs. Clustering the weighted edge similarity graph yields recurrent co-expression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms.

Keywords: co-expression networks; edge-edge similarity; biological modules.

DOI: 10.1504/IJDMB.2017.086098

International Journal of Data Mining and Bioinformatics, 2017 Vol.18 No.1, pp.28 - 39

Received: 06 May 2017
Accepted: 06 May 2017

Published online: 24 Aug 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article