Generating simulated SNP array and sequencing data to assess genomic segmentation algorithms Online publication date: Wed, 31-Mar-2021
by Mark R. Zucker; Kevin R. Coombes
International Journal of Computational Biology and Drug Design (IJCBDD), Vol. 13, No. 5/6, 2020
Abstract: We developed a tool, implemented in an R package called true and accurate clone generator (TACG), to simulate 'ground truth' and realistic SNP array and single nucleotide variant (SNV) data. We present TACG and use it to assess several different approaches to segmentation of copy number data from SNP arrays, with a particular interest in detecting copy number variations (CNVs) in cancer samples. We demonstrate that DNAcopy, an algorithm using circular binary segmentation, generally performs best, which is in agreement with previous research. We determine the conditions under which it and other methods break down. In particular, we assess how characteristics like clonal heterogeneity, presence of nested CNVs, and the type of aberration affect algorithm accuracy. The simulations we generated proved to be useful in determining not just the comparative overall accuracy of different algorithms, but also in determining how their efficacy is affected by the biological characteristics of samples from which the data was generated.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Computational Biology and Drug Design (IJCBDD):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com