Title: Using Hybrid Hierarchical K-means (HHK) clustering algorithm for protein sequence motif Super-Rule-Tree (SRT) structure construction

Authors: Bernard Chen, Jieyue He, Stephen Pellicer, Yi Pan

Addresses: Department of Computer Science, University of Central Arkansas, 201 Donaghey Avenue, Conway, AR 72035, USA. ' Department of Computer Science, Georgia State University, 34 Peachtree Street, Atlanta, GA 30303, USA; and School of Computer Science and Engineering, Southeast University, Nanjing 210096, China. ' Department of Computer Science, Georgia State University, 34 Peachtree Street, Atlanta, GA 30303, USA. ' Department of Computer Science, Georgia State University, 34 Peachtree Street, Atlanta, GA 30303, USA

Abstract: Many algorithms or techniques to discover motifs require a predefined fixed window size in advance. Because of the fixed size, these approaches often deliver a number of similar motifs simply shifted by some bases or including mismatches. To confront the mismatched motifs problem, we use the super-rule concept to construct a Super-Rule-Tree (SRT) by a modified Hybrid Hierarchical K-means (HHK) clustering algorithm, which requires no parameter set-up to identify the similarities and dissimilarities between the motifs. By analysing the motif results generated by our approach, they are significant not only in sequence area but also in secondary structure similarity.

Keywords: SRT; super-rule-tree; HHK; hybrid hierarchical k-means; clustering algorithms; protein sequence motif; bioinformatics; mismatched motifs; protein sequences.

DOI: 10.1504/IJDMB.2010.033523

International Journal of Data Mining and Bioinformatics, 2010 Vol.4 No.3, pp.316 - 330

Published online: 02 Jun 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article