Title: Algorithms for mapping short degenerate and weighted sequences to a reference genome

Authors: Pavlos Antoniou, Costas S. Iliopoulos, Laurent Mouchard, Solon P. Pissis

Addresses: University of Cyprus, Department of Computer Science, Nicosia, Cyprus. ' King's College London, Department of Computer Science, London, UK; Curtin University, Digital Ecosystems and Business Intelligence Institute, Perth, Australia. ' University of Rouen, LITIS (EA 4108), System and Information Processing, 76821 Mont Saint Aignan Cedex, France; King's College London, Department of Computer Science, London, UK. ' King's College London, Department of Computer Science, London, UK

Abstract: Novel high-throughput (Deep) sequencing technologies have redefined the way genome sequencing is performed. They are able to produce millions of short sequences in a single experiment and with a much lower cost than previous methods. In this paper, we address the problem of efficiently mapping and classifying millions of short sequences to a reference genome, based on whether they occur exactly once in the genome or not, and by taking into consideration probability scores. In particular, we design algorithms for Massive Exact and Approximate Pattern Matching of short degenerate and weighted sequences, derived from Deep sequencing technologies, to a reference genome.

Keywords: deep sequencing; high-throughput sequencing; string algorithms; degenerate sequences; weighted sequences; genome sequencing; probability scores; pattern matching.

DOI: 10.1504/IJCBDD.2009.030768

International Journal of Computational Biology and Drug Design, 2009 Vol.2 No.4, pp.385 - 397

Published online: 04 Jan 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article