Title: Analysis of cross sequence similarities for multiple DNA sequences compression

Authors: Paula Wu, Ngai-Fong Law, Wan-Chi Siu

Addresses: Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong. ' Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong. ' Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong

Abstract: Current DNA compression algorithms rely on finding repetitions within the DNA sequence so that similar subsequences can be encoded by referencing to each other. We explore similarities between different chromosomes of the sequence |Saccharomyces cerevisiae|. These similarities are characterised by the existence of similar subsequences among different chromosomes. The longer the similar subsequences are, the higher the cross-similarities are. Our study indicates that these cross-sequence similarities are often significant as compared to self-sequence similarity. This implies that it would be advantageous to compress two or more chromosome sequences together so that similar subsequences found between multiple chromosome sequences can be encoded together.

Keywords: computer aided engineering; CAE; technology; deoxyribonucleic acid sequence; DNA sequence; prediction; Saccharomyces cerevisiae; multiple DNA sequences; multiple chromosomes; cross chromosomal similarities; compression.

DOI: 10.1504/IJCAET.2009.028551

International Journal of Computer Aided Engineering and Technology, 2009 Vol.1 No.4, pp.437 - 454

Published online: 18 Sep 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article