Title: Association test for rare variants using the hamming distance

Authors: Suhyun Hwangbo; Jin-Young Jang; Bermseok Oh; Atsuko Imai-Okazaki; Jurg Ott; Taesung Park

Addresses: Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea ' Department of Surgery and Cancer Research Institute, Seoul National University College of Medicine, Seoul, South Korea ' Department of Biochemistry and Molecular Biology, Kyung Hee University, School of Medicine, Seoul, South Korea ' Department of Genome Informatics, Osaka University, Graduate School of Medicine, Suita, Japan; Intractable Disease Research Centre, Juntendo University, Graduate School of Medicine, Tokyo, Japan ' Laboratory of Statistical Genetics, Rockefeller University, New York, USA ' Department of Statistics, Seoul National University, Seoul, South Korea

Abstract: The recent development of DNA sequencing technology has given rise to many statistical methods for Rare Variant Association Studies (RVASs), such as burden and sequence kernel association tests. However, these methods, which usually require large samples, can lose power in association studies with small samples. In this study, we propose two statistical approaches applicable for RVASs when the sample size is not large. Our approaches are based on the Hamming distance, which compares the dissimilarity of Single Nucleotide Polymorphisms (SNPs) components between cases and controls. Existing Hamming distance-based methods mainly analyse common variants. For rare variant data with a small sample size, we extended two existing methods by using the weight based on minor allele frequency. Through simulation studies, we show that our proposed approaches control type 1 error rates and are more powerful even when given very small sample sizes. They also work well regardless of the direction of causal SNP effects. Applying these methods to real data, we confirmed that they identified true causal genes well. Based on the results of this study, we firmly believe that our proposed methods are powerful for small sample data.

Keywords: RVASs; rare variant association studies; hamming distance; MAF; minor allele frequency.

DOI: 10.1504/IJDMB.2018.098938

International Journal of Data Mining and Bioinformatics, 2018 Vol.21 No.4, pp.301 - 314

Received: 22 Dec 2018
Accepted: 22 Dec 2018

Published online: 30 Mar 2019 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article