Title: A novel network and sparsity constraint regression model for functional module identification in genomic data analysis

Authors: Zheng Xia; Wei Chen; Chunqi Chang; Xiaobo Zhou

Addresses: Department of Radiology, The Methodist Hospital Research Institute, Houston, TX 77030, USA; Weill Cornell Medical College, New York, NY 10065, USA ' Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China ' Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China ' Department of Radiology, The Methodist Hospital Research Institute, Houston, TX 77030, USA; Weill Cornell Medical College, New York, NY 10065, USA

Abstract: It is important to incorporate the accumulated biological pathways and interactions knowledge into genome-wide association studies to elucidate correlations between genetic variants and disease. Although a number of methods have been developed recently to identify disease related genes using prior biological knowledge, most methods only encourage the smoothness of the coefficients along the network which does not address the case where two connected genes both have positive or negative effects on the response. To overcome this issue, we propose to apply the Laplacian operation on the absolute values of the coefficients to take account of the positive and negative effects as well as a L1 norm term to impose sparsity. Further, an efficient algorithm is developed to get the whole solution path. Simulation studies show that the proposed method has better performance than network-constrained regularisation without absolute values. Applying our method on a microarray data of Alzheimer's disease (AD) identifies several subnetworks on Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to progression of AD. Many of those findings are confirmed by published literature.

Keywords: Laplacian matrix; elastic net; accumulated biological pathways; whole solution path; network constraints; sparsity constraints; regression modelling; functional module identification; genomic data analysis; bioinformatics; genetic variants; disease; simulation; microarray data; Alzheimer's disease.

DOI: 10.1504/IJDMB.2013.056081

International Journal of Data Mining and Bioinformatics, 2013 Vol.8 No.3, pp.311 - 325

Received: 02 May 2011
Accepted: 02 May 2011

Published online: 20 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article