Title: Application of SNPViz v2.0 using next-generation sequencing data sets in the discovery of potential causative mutations in candidate genes associated with phenotypes

Authors: Shuai Zeng; Mária Škrabišová; Zhen Lyu; Yen On Chan; Nicholas Dietz; Kristin Bilyeu; Trupti Joshi

Addresses: Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, Missouri, USA; Christopher S. Bond Life Sciences Centre, University of Missouri-Columbia, Columbia, Missouri, USA ' Department of Biochemistry, Faculty of Science, Palacký University in Olomouc, Olomouc, Czech Republic ' Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, Missouri, USA; Department of Health Management and Informatics, University of Missouri-Columbia, Columbia, Missouri, USA ' Department of Health Management and Informatics, University of Missouri-Columbia, Columbia, Missouri, USA; MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, Missouri, USA ' Division of Plant Sciences, University of Missouri-Columbia, Columbia, Missouri, USA ' USDA/ARS Plant Genetics Research Unit, University of Missouri-Columbia, Columbia, Missouri, USA ' Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, Missouri, USA; Department of Health Management and Informatics, University of Missouri-Columbia, Columbia, Missouri, USA

Abstract: Single Nucleotide Polymorphisms (SNPs) and insertions/deletions (Indels) are the most common biological markers widely spread across all genome chromosomes. Owing to the large amount of SNPs and Indels data that have become available during the last ten years, it is a challenge to intuitively integrate, compare, or visualise them and effectively perform analysis across multiple samples simultaneously. Genome-Wide Association Studies (GWAS) is an approach to find genetic variants associated with a trait, but it lacks an efficient way of investigating genomic variant functions. To tackle these issues, we developed SNPViz v2.0, a web-based tool designed to visualise large-scale haplotype blocks with detailed SNPs and Indels grouped by their chromosomal coordinates, along with their overlapping gene models, phenotype to genotype accuracies, Gene Ontology (GO), protein families (Pfam), and their functional effects. SNPViz v2.0 is available in both SoyKB and KBCommons. For soya bean only, the SNPViz v2.0 is available online at: http://soykb.org/SNPViz2/. For other plants such as Arabidopsis thaliana and Zea mays, SNPViz v2.0 in their respective knowledge bases is available online at: https://kbcommons.org.

Keywords: SNP; NGS; genotypes; phenotypes; visualisation.

DOI: 10.1504/IJDMB.2021.116886

International Journal of Data Mining and Bioinformatics, 2021 Vol.25 No.1/2, pp.65 - 85

Received: 09 Mar 2021
Accepted: 05 Apr 2021

Published online: 05 Aug 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article