Title: Smart variant filtering
Authors: Vladimir Kovacevic; Predrag Obradovic
Addresses: School of Electrical Engineering, University of Belgrade, Belgrade, Serbia ' School of Electrical Engineering, University of Belgrade, Belgrade, Serbia
Abstract: Variant filtering as a part of the genome reconstruction process is used for identifying falsely called variants. Availability of truth set variants published for several human DNA samples enabled the creation of the machine learning-based Smart Variant Filtering tool and framework for filtering germline variants. Conceptually, the framework consists of selecting an optimal machine learning algorithm, configuration, set of features, and producing a model used for filtering novel variants. With direct comparison, we demonstrated that the presented solution outperforms variant filtering currently used within most secondary DNA analyses. Smart Variant Filtering increases the precision of called single nucleotide variants (removes false positives) by up to 0.2% while keeping the overall f-score higher by 0.12-0.27% than in existing solutions. The precision of calling insertions and deletions is increased up to 7.8%, while the f-score increase is in the range of 0.1% to 3.2%.
Keywords: genomic variant filtering; variant calling; machine learning.
DOI: 10.1504/IJDMB.2021.126836
International Journal of Data Mining and Bioinformatics, 2021 Vol.26 No.3/4, pp.151 - 165
Accepted: 08 Jun 2022
Published online: 08 Nov 2022 *