Title: MARAGAP: a modular approach to reference assisted genome assembly pipeline

Authors: Bilal Wajid; Erchin Serpedin; Mohamed Nounou; Hazem Nounou

Addresses: Department of Engineering, Texas A&M International University, Laredo, TX 78043, USA; Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77840, USA ' Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77840, USA ' Chemical Engineering Program, Texas A&M University at Qatar, Doha 23874, Qatar ' Electrical and Computer Engineering Program, Texas A&M University at Qatar, Doha 23874, Qatar

Abstract: This paper presents MARAGAP, a modular approach to reference assisted genome assembly pipeline. MARAGAP uses the principle of Minimum Description Length to determine the optimal reference sequence for the assembly. The optimal reference sequence is used as a template to infer inversions, insertions, deletions and SNPs in the target genome. MARAGAP uses an algorithmic approach to detect and correct inversions and deletions, a De-Bruijn graph based approach to infer the insertions, an affine-match affine-gap local alignment tool to estimate the locations of insertions and a Bayesian estimation framework for detecting SNPs.

Keywords: next generation sequencing; genome assembly; reference assisted assembly; graph theory; Bayesian statistics; minimum description length principle; De-Bruijn graph; SNPs; single nucleotide polymorphisms; mutations; local alignment; reference sequences; inversions; insertions; deletion; bioinformatics.

DOI: 10.1504/IJCBDD.2015.072073

International Journal of Computational Biology and Drug Design, 2015 Vol.8 No.3, pp.226 - 250

Received: 23 Jul 2014
Accepted: 01 Oct 2014

Published online: 30 Sep 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article