Title: Searching a pattern in compressed DNA sequences

Authors: Ashutosh Gupta, Suneeta Agarwal

Addresses: Department of Computer Science and Information Technology, I.E.T, MJP Rohilkhand University, Bareilly 243006, India. ' Department of Computer Science and Engineering, MNNIT, Allahabad 211002, India

Abstract: This paper introduces a novel algorithm for DNA sequence compression that makes use of a transformation and statistical properties within the transformed sequence. A word based tagged code is used for identification of end of code. The word based encoder uses frequency distribution for assigning the code of words. The designed compression algorithm is efficient and effective for DNA sequence compression. As a statistical compression method, it is able to search the pattern inside the compressed text which is useful in knowledge discovery. Experiments show that our algorithm is shown to outperform existing compressors on typical DNA sequence datasets.

Keywords: biological sequences; DNA compression; pattern searching; sequence compression; DNA sequences; bioinformatics; statistical compression.

DOI: 10.1504/IJBRA.2011.040091

International Journal of Bioinformatics Research and Applications, 2011 Vol.7 No.2, pp.115 - 129

Received: 10 Feb 2010
Accepted: 17 Jun 2010

Published online: 13 May 2011 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article