Title: Concod: an effective integration framework of consensus-based calling deletions from next-generation sequencing data

Authors: Lei Cai; Chong Chu; Xiaodong Zhang; Yufeng Wu; Jingyang Gao

Addresses: College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China ' Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA ' College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China ' Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA ' College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China

Abstract: Detection of structural variations such as deletion with short sequence reads from next-generation sequencing is a significant but challenging problem in the field of genome analysis. This paper proposes a conceptual framework to improve the effects of calling deletions. Although the genetic sequencing tools are massively produced for the moment, not a single method clearly outperforms all other methods. At present, a widely used way of deletion detection is merging, which combined all the features to achieve more accurate deletion calling. However, most existing methods using the combining approach are heuristic and the called deletions by these tools still contain many wrongly called deletions. In this paper, we introduce Concod, an effective integration framework using machine learning to detect deletions. First, Concod collects the candidate deletions from multiple existing deletion detection tools. Then, based on the multiple detection theories, the features of candidates are extracted from sequence. Last, according to these features, a machine learning model is trained to distinguish the true and false candidates. We test our framework on different coverage of real data and make a comparison with other existing tools, including Pindel, SVseq2, BreakDancer and DELLY. Results show that Concod improves both precision and sensitivity of deletion detection significantly.

Keywords: structural variations; deletion detection; machine learning; feature extraction; next-generation sequencing.

DOI: 10.1504/IJDMB.2017.084267

International Journal of Data Mining and Bioinformatics, 2017 Vol.17 No.2, pp.153 - 172

Available online: 22 May 2017 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article