Authors: Pavlos Antoniou, Costas S. Iliopoulos, Laurent Mouchard, Solon P. Pissis
Addresses: University of Cyprus, Department of Computer Science, Nicosia, Cyprus. ' King's College London, Department of Computer Science, London, UK; Curtin University, Digital Ecosystems and Business Intelligence Institute, Perth, Australia. ' University of Rouen, LITIS (EA 4108), System and Information Processing, 76821 Mont Saint Aignan Cedex, France; King's College London, Department of Computer Science, London, UK. ' King's College London, Department of Computer Science, London, UK
Abstract: Novel high-throughput (Deep) sequencing technologies have redefined the way genome sequencing is performed. They are able to produce millions of short sequences in a single experiment and with a much lower cost than previous methods. In this paper, we address the problem of efficiently mapping and classifying millions of short sequences to a reference genome, based on whether they occur exactly once in the genome or not, and by taking into consideration probability scores. In particular, we design algorithms for Massive Exact and Approximate Pattern Matching of short degenerate and weighted sequences, derived from Deep sequencing technologies, to a reference genome.
Keywords: deep sequencing; high-throughput sequencing; string algorithms; degenerate sequences; weighted sequences; genome sequencing; probability scores; pattern matching.
International Journal of Computational Biology and Drug Design, 2009 Vol.2 No.4, pp.385 - 397
Published online: 04 Jan 2010 *Full-text access for editors Access for subscribers Purchase this article Comment on this article