Int. J. of Computational Biology and Drug Design   »   2017 Vol.10, No.2

 

 

Title: Improving de novo metatranscriptome assembly via machine learning algorithms

 

Authors: Hussein Mohsen; Haixu Tang; Yuzhen Ye

 

Addresses:
School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA

 

Abstract: In this paper, we present DNPipe, a Pipeline that processes and filters the transcript contigs reported by existing De Novo metatranscriptome assembly algorithms, aiming to improve the quality of de novo assembly of metatranscriptomic sequences. DNPipe consists of expectation-maximisation (EM) and sampling approaches that utilise abundance information of transcript contigs. We tested DNPipe on six metatranscriptomic datasets acquired from a mock microbial community dominated by three (among more than 15) known bacterial genomes. Results show that DNPipe can substantially improve the quality of metatranscriptome assembly, producing longer and more accurate transcripts. The N50 of the contigs increases by 19% (from around 1880 bps to 2250 bps), and the precision of the assembly improves by up to 8.7%, achieving about 81%. DNPipe assemblies are of higher quality than those assembled by Trinity as well. The DNPipe tool can be downloaded as open source software at https://sourceforge.net/projects/dnpipe/files/.

 

Keywords: de novo; sequence; assembly; expectation-maximisation; EM; metatranscriptomics; RNA-Seq.

 

DOI: 10.1504/IJCBDD.2017.10004575

 

Int. J. of Computational Biology and Drug Design, 2017 Vol.10, No.2, pp.91 - 107

 

Available online: 15 Apr 2017

 

 

Editors Full Text AccessAccess for SubscribersPurchase this articleComment on this article