Title: Greedily assemble tandem repeats for next generation sequences

Authors: Yongqing Jiang; Jinhua Lu; Jingyu Hou; Wanlei Zhou

Addresses: Deakin University, 221 Burwood Highway, Burwood, VIC 3125, Australia ' National University of Singapore, Blk MD4, 5 Science Drive 2, Singapore 117597, Singapore ' Deakin University, 221 Burwood Highway, Burwood, VIC 3125, Australia ' University of Technology Sydney, 15 Broadway, Ultimo NSW 2007, Australia

Abstract: Eukaryotic genomes contain high volumes of intronic and intergenic regions in which repetitive sequences are abundant. These repetitive sequences represent challenges in genomic assignment of short read sequences generated through next generation sequencing and are often excluded in analysis losing invaluable genomic information. Here we present a method, known as tandem repeat assembler (TRA), for the assembly of repetitive sequences by constructing contigs directly from paired-end reads. Using an experimentally acquired data set for human chromosome 14, tandem repeats >200 bp were assembled. Alignment of the contigs to the human genome reference (GRCh38) revealed that 84.3% of tandem repetitive regions were correctly covered. For tandem repeats, this method outperformed state-of-the-art assemblers by generating correct N50 of contigs up to 512 bp.

Keywords: tandem repeat; assembly; next generation sequencing; NGS.

DOI: 10.1504/IJHPCN.2019.103536

International Journal of High Performance Computing and Networking, 2019 Vol.15 No.1/2, pp.1 - 11

Received: 06 May 2017
Accepted: 07 May 2018

Published online: 08 Nov 2019 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article