Title: Novel efficient granular computing models for protein sequence motifs and structure information discovery

Authors: Bernard Chen, Stephen Pellicer, Phang C. Tai, Robert Harrison, Yi Pan

Addresses: Department of Computer Science, University of Central Arkansas, 201 Donaghey Ave. MCST304, Conway, AR 72035, USA. ' Department of Computer Science, Georgia State University, 34 Peachtree Street Room, 1417, Atlanta, GA 30303, USA. ' Department of Biology, Georgia State University, 402 Kell Hall, 24 Peachtree Center Ave., Atlanta, GA 30303, USA. ' Department of Computer Science, Georgia State University, 34 Peachtree Street Room, 1440, Atlanta, GA 30303, USA. ' Department of Computer Science, Georgia State University, 34 Peachtree Street Room, 1440, Atlanta, GA 30303, USA

Abstract: Protein sequence motifs have the potential to determine the conformation, function and activities of the proteins. In order to obtain protein sequence motifs which are universally conserved across protein family boundaries, unlike most popular motif discovering algorithms, our input dataset is extremely large. As a result, an efficient technique is demanded. We create two granular computing models to efficiently generate protein motif information which transcend protein family boundaries. We have performed a comprehensive comparison between the two models. In addition, we further combine the results from the FIK and FGK models to generate our best sequence motif information.

Keywords: clustering; FGK model; FIK model; greedy K-means; sequence motifs; protein structures; granular computing; modelling; protein sequences.

DOI: 10.1504/IJCBDD.2009.028822

International Journal of Computational Biology and Drug Design, 2009 Vol.2 No.2, pp.168 - 186

Available online: 03 Oct 2009

Full-text access for editors Access for subscribers Purchase this article Comment on this article