Title: Structural variation calling and genotyping by moment-based deep convolutional neural networks

Authors: Timothy Becker; Dong-Guk Shin

Addresses: Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, USA ' Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, USA

Abstract: Structural Variation (SV) calling and genotyping remain an ongoing challenge using next generation sequencing technologies. The gold standard approach for genome consortia has been to utilise multiple SV calling algorithms and then merge the results based on SV type and coordinates and more recently to make use of multiple sequencing technologies for each sample cell line. This ensemble strategy provides more comprehensive SV calling but comes at the cost of high-compute run time. We make use of popular open-source machine learning libraries to formulate a new data representation suitable for mining whole genome sequences in a fraction of the ensemble time. We then compare the results to several well-established methods and ensembles. Our pure machine learning method demonstrates a new direction in technique, where feature selection and region filtering are no longer required to achieve desirable false positive rates.

Keywords: genomic variation; structural variation; data representation; moment-based tensors; machine learning; convolutional neural networks.

DOI: 10.1504/IJDMB.2021.116880

International Journal of Data Mining and Bioinformatics, 2021 Vol.25 No.1/2, pp.37 - 52

Received: 15 Mar 2021
Accepted: 05 Apr 2021

Published online: 05 Aug 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article