Title: Smart variant filtering

Authors: Vladimir Kovacevic; Predrag Obradovic

Addresses: School of Electrical Engineering, University of Belgrade, Belgrade, Serbia ' School of Electrical Engineering, University of Belgrade, Belgrade, Serbia

Abstract: Variant filtering as a part of the genome reconstruction process is used for identifying falsely called variants. Availability of truth set variants published for several human DNA samples enabled the creation of the machine learning-based Smart Variant Filtering tool and framework for filtering germline variants. Conceptually, the framework consists of selecting an optimal machine learning algorithm, configuration, set of features, and producing a model used for filtering novel variants. With direct comparison, we demonstrated that the presented solution outperforms variant filtering currently used within most secondary DNA analyses. Smart Variant Filtering increases the precision of called single nucleotide variants (removes false positives) by up to 0.2% while keeping the overall f-score higher by 0.12-0.27% than in existing solutions. The precision of calling insertions and deletions is increased up to 7.8%, while the f-score increase is in the range of 0.1% to 3.2%.

Keywords: genomic variant filtering; variant calling; machine learning.

DOI: 10.1504/IJDMB.2021.126836

International Journal of Data Mining and Bioinformatics, 2021 Vol.26 No.3/4, pp.151 - 165

Accepted: 08 Jun 2022
Published online: 08 Nov 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article