Title: Frameshift detection in prokaryotic genomic sequences

Authors: Andrey Kislyuk, Alexandre Lomsadze, Alla L. Lapidus, Mark Borodovsky

Addresses: School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30332, USA. ' Daniel Guggenheim School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA. ' Joint Genome Institute, US Department of Energy (DOE-JGI), Walnut Creek, California 94598, USA. ' Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia 30332, USA; Division of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA

Abstract: We have developed a new method for frameshift detection, a combination of ab initio and alignment-based algorithms, that can serve as a useful tool for sequencing quality control in the next generation sequencing. We evaluated the method|s accuracy on test sets of annotated genomic sequences with artificial frameshifts in protein coding regions. These tests have shown that the new method performs comparably to the earlier developed FrameD. On the sets of sequences produced by 454 pyrosequencing with sequence errors recovered by Sanger re-sequencing the accuracy of the method was shown to hold at the same level.

Keywords: Markov chains; machine learning; frameshift; pyrosequencing; genomics; HMM; hidden Markov model; sensitivity; specificity; frameshift detection; prokaryotic genomic sequences; bioinformatics; sequencing quality control; protein coding.

DOI: 10.1504/IJBRA.2009.027519

International Journal of Bioinformatics Research and Applications, 2009 Vol.5 No.4, pp.458 - 477

Published online: 28 Jul 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article