Title: The effect of different sets of critical values on type I error rates in tiled regression for genome-wide association studies
Authors: Heejong Sung; Jeremy A. Sabourin; Alexa J.M. Sorant; Alexander F. Wilson
Addresses: Genometrics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive, Suite 1200, Baltimore, MD 21224, USA ' Genometrics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive, Suite 1200, Baltimore, MD 21224, USA ' Genometrics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive, Suite 1200, Baltimore, MD 21224, USA ' Genometrics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive, Suite 1200, Baltimore, MD 21224, USA
Abstract: The effects of different sets of critical values on type I error rates in tiled regression were investigated using genome-wide association data from the Trinity Student Study. 200 simulated null traits from the standard normal distribution were analysed using four different sets of critical values for stepwise regression within tiled regression. We observed that (1) the multicollinearity among SNPs considered and the aggregate type I error rates decreased through three levels of tiled regression; (2) the region-specific type I error rates were slightly lower than the 'nominal' critical values at the tile level; and (3) the critical value at the tile level is between the two aggregate type I error rates defined under two different assumptions about the number of tests (the number of SNPs and the number of tiles). The choice of critical value at each stage of tiled regression affects the overall type I error rate.
Keywords: multicollinearity; linkage disequilibrium; test of association; GWAS; critical values; type I error rates; tiled regression; genome-wide association studies; bioinformatics; SNPs; single nucleotide polymorphisms.
DOI: 10.1504/IJDMB.2016.080030
International Journal of Data Mining and Bioinformatics, 2016 Vol.16 No.2, pp.111 - 120
Received: 12 May 2016
Accepted: 15 May 2016
Published online: 29 Oct 2016 *