Title: Genome-wide efficient attribute selection for purely epistatic models via Shannon entropy

Authors: Amirhossein Manzourolajdad, Mohammad Saraee, Aghafakhr Mirlohi, Abolfazl Javan

Addresses: Department of Electrical and Computer Engineering, Isfahan University of Technology (IUT), Isfahan, Iran. ' Department of Electrical and Computer Engineering, Isfahan University of Technology (IUT), Isfahan, Iran. ' Department of Agricultural Biotechnology, College of Agriculture, Isfahan University of Technology (IUT), Isfahan, Iran. ' Department of Electrical and Computer Engineering, Isfahan University of Technology (IUT), Isfahan, Iran

Abstract: Epistasis plays an important role in the genetic architecture of common human diseases. Most complex diseases are believed to have multiple contributing loci that often have subtle patterns which make them fairly difficult to find in large data sets. Disorders that follow purely epistatic models cannot be detected by cases/control studies based on individual analysis of susceptible loci. The computational complexity of performing exhaustive searches for detecting such models in genome-wide applications is practically unfeasible. Furthermore, with ever-increasing number of both genotypes and individuals on one side, and little knowledge of complex traits on the other, it is becoming fairly difficult and time consuming to perform systematic genome-wide studies on such traits. We present and discuss a convenient framework for modelling epistasis using information theoretic concepts and algorithms inspired by such an approach. These generalised algorithms, which are especially in favour of purely epistatic models, are applied to both simulated and real data. The real data represents the genotype-phenotype values for Age-Related Macular Degeneration (AMD) disease. Many two-locus purely epistatic patterns were found for AMD. A new visualisation approach is also presented for the purpose of better illustrating epistasy for cases where the number of loci is more than two or three.

Keywords: complex diseases; genetic markers; gene mapping; genome-wide approach; epistasis modelling; multivariate mutual information; attribute selection; data mining; positive-negative interactions; Shannon entropy; information theory; age-related macular degeneration; AMD.

DOI: 10.1504/IJBIDM.2008.022736

International Journal of Business Intelligence and Data Mining, 2008 Vol.3 No.4, pp.390 - 408

Available online: 25 Jan 2009 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article