Title: Several remarks on the metric space of genetic codes

Authors: David Weisman; Dan A. Simovici

Addresses: Department of Biology, University of Massachusetts Boston, 100 Morrissey Blvd., Boston, Massachusetts 02125, USA. ' Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Blvd., Boston, Massachusetts 02125, USA

Abstract: A genetic code, the mapping from trinucleotide codons to amino acids, can be viewed as a partition on the set of 64 codons. A small set of non-standard genetic codes is known, and these codes can be mathematically compared by their partitions of the codon set. To measure distances between set partitions, this study defines a parameterised family of metric functions that includes Shannon entropy as a special case. Distances were computed for 17 curated genetic codes using four members of the metric function family. With these metric functions, nuclear genetic codes had relatively small inter-code distances, while mitochondrial codes exhibited greater variance. Hierarchical clustering using Ward's algorithm produced a tight grouping of nuclear codes and several distinct clades of mitochondrial codes. This family of functions may be employed in other biological applications involving set partitions, such as analysis of microarray data, gene set enrichment and protein-protein interaction mapping.

Keywords: non-standard genetic codes; metric space; set partitions; Shannon entropy; data mining; clustering; classification; discretisation; Gini index; bioinformatics; nuclear genetic codes; mitochondrial codes; microarray data analysis; gene set enrichment; protein-protein interaction; PPI.

DOI: 10.1504/IJDMB.2012.045534

International Journal of Data Mining and Bioinformatics, 2012 Vol.6 No.1, pp.17 - 26

Received: 05 Oct 2009
Accepted: 28 Dec 2009

Published online: 17 Dec 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article