Title: Normalised compression distance and evolutionary distance of genomic sequences: comparison of clustering results

Authors: Massimo La Rosa, Salvatore Gaglio, Riccardo Rizzo, Alfonso Urso

Addresses: Department of Computer Science, University of Palermo, Viale delle Scienze, Ed. 6, Palermo, Italy. ' Department of Computer Science, University of Palermo, Viale delle Scienze, Ed. 6, Palermo, Italy ' High Performance Computing and Networking Institute, Italian National Research Council (ICAR-CNR), Viale delle Scienze, Ed. 11, Palermo, Italy. ' High Performance Computing and Networking Institute, Italian National Research Council (ICAR-CNR), Viale delle Scienze, Ed. 11, Palermo, Italy

Abstract: Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a time consuming procedure and the obtained dissimilarity results is not a metric. Recently, the normalised compression distance was introduced as a method to calculate the distance between two generic digital objects and it seems a suitable way to compare genomic strings. In this paper, the clustering and the non-linear mapping obtained using the evolutionary distance and the compression distance are compared, in order to understand if the two distances sets are similar.

Keywords: universal similarity metric; USM; clustering; DNA sequences; normalised compression distance; evolutionary distance; genomic sequences; nonlinear mapping; bioinformatics.

DOI: 10.1504/IJKESDP.2009.028987

International Journal of Knowledge Engineering and Soft Data Paradigms, 2009 Vol.1 No.4, pp.345 - 362

Published online: 19 Oct 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article