Title: Pattern discovery of multivariate phenotypes by Association Rule Mining and its scheme for Genome-Wide Association Studies

Authors: Sung Hee Park; Sangsoo Kim

Addresses: Department of Bioinformatics and Life Sciences, Soongsil University, Seoul 156-743, South Korea ' Department of Bioinformatics and Life Sciences, Soongsil University, Seoul 156-743, South Korea

Abstract: Genome-Wide Association Studies (GWAS) have served crucial roles in investigating disease susceptible loci for single traits. On the other hand, GWAS have been limited in measuring genetic risk factors for multivariate phenotypes from pleiotropic genetic effects of genetic loci. This work reports a data mining approach to discover patterns of multivariate phenotypes expressed as association rules, and presents an analytical scheme for GWAS of those newly defined multivariate phenotypes. We identified 13 SNPs for four genes (CSMD1, NFE2L1, CBX1, and SKAP1) associated with a new multivariate phenotype defined as low levels of low density lipoprotein cholesterol (LDL-C ≤ 100 mg/dl) and high levels of triglycerides (TG ≥ 180 mg/dl). Compared with a traditional approach to GWAS, the use of discovered multivariate phenotypes can be advantageous in identifying pleiotropic genetic risk factors, which may have a common etiological role for the multivariate phenotypes.

Keywords: GWAS; genome-wide association study; SNPs; single nucleotide polymorphisms; multivariate phenotypes; ARM; association rule mining; pleiotropy; CSMD1; triglycerides; low density lipoprotein cholesterol; pattern discovery; data mining; bioinformatics.

DOI: 10.1504/IJDMB.2012.049299

International Journal of Data Mining and Bioinformatics, 2012 Vol.6 No.5, pp.505 - 520

Received: 29 Apr 2011
Accepted: 29 Apr 2011

Published online: 17 Dec 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article