Authors: Shenglong Zhu; Scott J. Emrich; Danny Z. Chen
Addresses: Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA ' Min H. Kao Department of Electrical Engineering & Computer Science, The University of Tennessee, Knoxville, Knoxville, TN 37996, USA ' Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
Abstract: Structural variations have received considerable attention in the past decade owing to their importance in disease aetiology and ecological adaptation. Many prior efforts have exploited short paired-end reads to detect structural variations and, more recently, improved approaches have combined newer long reads with short ones to better predict variants. In this paper, we propose a new computational framework that uses only long reads to target a specific type of structural variations: large inversions. Our approach is complementary to state-of-the-art methods, but models identifying inversions as a Max-Cut problem. We show that this new approach is effective for predicting large inversions comparing to current structural variation detection tools. This new formulation also uncovers more complex structural variants that are not discovered by alternative frameworks. We conclude that our new approach is potentially powerful for detecting inversions in complex genomes.
Keywords: inversion detection; PacBio long reads; structural variation; short paired-end reads; large inversion; breakpoint detection; max cut; complex inversion; complex genomes; InvDet; range minimum query; simple inversion; approximation algorithm; validated segment; mate pairs.
International Journal of Data Mining and Bioinformatics, 2018 Vol.20 No.3, pp.230 - 246
Available online: 13 Sep 2018 *Full-text access for editors Access for subscribers Purchase this article Comment on this article