Title: A comparison of genetic imputation methods using Long Life Family Study genotypes and sequence data with the 1000 Genome reference panel

Authors: Aldi T. Kraja; E. Warwick Daw; Petra Lenzini; Lihua Wang; Shiow J. Lin; Christine A. Williams; Alan B. Wells; Kathryn L. Lunetta; Joanne M. Murabito; Paola Sebastiani; Giuseppe Tosto; Sandra Barral; Ryan L. Minster; Anatoly Yashin; Thomas Perls; Michael A. Province

Addresses: Division of Statistical Genomics, Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Farrell Learning Center, S. Euclid Ave., St. Louis, MO 63110, USA ' Division of Statistical Genomics, Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Farrell Learning Center, S. Euclid Ave., St. Louis, MO 63110, USA ' Division of Statistical Genomics, Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Farrell Learning Center, S. Euclid Ave., St. Louis, MO 63110, USA ' Division of Statistical Genomics, Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Farrell Learning Center, S. Euclid Ave., St. Louis, MO 63110, USA ' Division of Statistical Genomics, Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Farrell Learning Center, S. Euclid Ave., St. Louis, MO 63110, USA ' Division of Statistical Genomics, Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Farrell Learning Center, S. Euclid Ave., St. Louis, MO 63110, USA ' Clinical and Translational Research Institute, University of California San Diego, San Diego, CA 92093, USA ' Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA ' Section of General Internal Medicine, Boston University School of Medicine and Framingham Heart Study, Boston, MA 02218, USA ' Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA ' Taub Institute for Research on Alzheimer's Disease and the Aging Brain, College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA ' Taub Institute for Research on Alzheimer's Disease and the Aging Brain, College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA ' Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA ' Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC 27705, USA ' Division of Geriatrics, Department of Medicine, Boston University School of Medicine, Boston, MA 02215, USA ' Division of Statistical Genomics, Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA

Abstract: This study compares methods of imputing genetic markers, given a typed GWAS scaffold from the Long Life Family Study (LLFS) and latest reference panel of 1000-Genomes. We examined two programs for pre-phasing haplotypes MACH/SHAPEIT2 and MINIMAC/IMPUTE2 for imputation. SHAPEIT2 is advantageous for haplotype pre-phasing. MINIMAC and IMPUTE2 produced similar imputation quality. We used a 4MB region on chromosome 2 of LLFS and in the Supplement, we compared methods using chromosome 19 data from the Genetic Analysis Workshop-19. IMPUTE2 had the advantage of using two references 1000G and a sequence for a subset of subjects. SHAPEIT2 and IMPUTE2 were used to finalise the full LLFS autosome imputation. In LLFS, 44% of ~80M autosomal imputed variants showed good imputation quality (info ≥ 0.30). Low imputation quality was associated with a predominantly low allele frequency in 1000-Genomes. New emerging large-scale sequences and enhanced imputation methodologies will further improve imputation quality.

Keywords: genetic imputation; 1000 Genomes reference; sequence reference; MACH software; MINIMACH software; SHAPEIT2 software; IMPUTE2 software; FCGENE software; LLFS; long life family study.

DOI: 10.1504/IJBRA.2020.104855

International Journal of Bioinformatics Research and Applications, 2020 Vol.16 No.1, pp.59 - 84

Received: 15 Apr 2017
Accepted: 27 Nov 2017

Published online: 05 Feb 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article