Inderscience PublishersInderscience PublishersInderscience Publishers
  PUBLISHERS OF DISTINGUISHED ACADEMIC, SCIENTIFIC AND PROFESSIONAL JOURNALS

Article Abstract

Title: Clustering sequences by overlap
  Author: Dietmar H. Dorr, Anne M. Denton   Email author(s)
  Address: Department of Computer Science, North Dakota State University, Fargo, ND, 58105, USA. ' Department of Computer Science, North Dakota State University, Fargo, ND, 58105, USA
  Journal: International Journal of Data Mining and Bioinformatics 2009 - Vol. 3, No.3  pp. 260 - 279
  Abstract: A clustering algorithm is introduced that combines the strengths of clustering and motif finding techniques. Clusters are identified based on unambiguously defined sequence sections as in motif finding algorithms. The definition of similarity within clusters allows transitive matches and, thereby, enables the discovery of remote homologies that cannot be found through motif-finding algorithms. Directed Acyclic Graph (DAG) structures are constructed that link short clusters to the longer ones. We compare the clustering results to the corresponding domains in the InterPro database. A second comparison shows that annotations based on our domains are inherently more consistent than those based on InterPro domains.
  Keywords: sequence clustering; motif finding; annotation; bioinformatics; DAG; directed acyclic graph; InterPro domains; similarity; transitive matches; remote homologies.
  DOI: 10.1504/IJDMB.2009.026701
  Access for editors and complimentary subscribers       Access for Subscribers   Purchase this Paper        We welcome your comments about this paper Comment on the Paper