Title: The effect of different sets of critical values on type I error rates in tiled regression for genome-wide association studies

Authors: Heejong Sung; Jeremy A. Sabourin; Alexa J.M. Sorant; Alexander F. Wilson

Addresses: Genometrics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive, Suite 1200, Baltimore, MD 21224, USA ' Genometrics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive, Suite 1200, Baltimore, MD 21224, USA ' Genometrics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive, Suite 1200, Baltimore, MD 21224, USA ' Genometrics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive, Suite 1200, Baltimore, MD 21224, USA

Abstract: The effects of different sets of critical values on type I error rates in tiled regression were investigated using genome-wide association data from the Trinity Student Study. 200 simulated null traits from the standard normal distribution were analysed using four different sets of critical values for stepwise regression within tiled regression. We observed that (1) the multicollinearity among SNPs considered and the aggregate type I error rates decreased through three levels of tiled regression; (2) the region-specific type I error rates were slightly lower than the 'nominal' critical values at the tile level; and (3) the critical value at the tile level is between the two aggregate type I error rates defined under two different assumptions about the number of tests (the number of SNPs and the number of tiles). The choice of critical value at each stage of tiled regression affects the overall type I error rate.

Keywords: multicollinearity; linkage disequilibrium; test of association; GWAS; critical values; type I error rates; tiled regression; genome-wide association studies; bioinformatics; SNPs; single nucleotide polymorphisms.

DOI: 10.1504/IJDMB.2016.080030

International Journal of Data Mining and Bioinformatics, 2016 Vol.16 No.2, pp.111 - 120

Received: 12 May 2016
Accepted: 15 May 2016

Published online: 29 Oct 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article