Title: Principal variable approach to multipurpose SNP selection in genetic association studies

Authors: Seunghyun Lee; Taesung Park; Mira Park

Addresses: Department of Statistics, Korea University, Seoul, Korea ' Department of Statistics, Seoul National University, Seoul, Korea ' Department of Preventive Medicine, Eulji University, Daejeon, Korea

Abstract: Despite the various merits of joint analysis of the multiple markers, a single marker analysis is still popularly adopted in many Genome-Wide Association Studies (GWAS). Since GWAS data tend to have many near-duplicated SNPs in the linkage equilibrium, it is a challenge to eliminate the redundant SNPs and determine the subset of the informative SNPs to be included in the joint analysis. In this study, we propose an unsupervised SNP selection algorithm based on the principal variable approach called the multipurpose SNP selection (MP-SNP) method. MP-SNP method takes subset of the original variables to keep the structure and information of the original variables, and the resulting SNP subset could be used for further analysis in various ways. Based on our simulation and real data analysis, we conclude that the MP-SNP method shows good performance in selecting the informative SNPs and also provides well-explained cluster structures.

Keywords: dimensional reduction; genome-wide association studies; GWAS; informative SNP; single nucleotide polymorphisms; MP-SNP; principal component analysis; PCA; principal variables; SNP clusters; topography; variable selection; bioinformatics; genetic association studies; unsupervised SNP selection; simulation.

DOI: 10.1504/IJDMB.2016.079800

International Journal of Data Mining and Bioinformatics, 2016 Vol.16 No.1, pp.32 - 46

Received: 12 May 2016
Accepted: 15 May 2016

Published online: 14 Oct 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article