Title: CLUSS2: an alignment-independent algorithm for clustering protein families with multiple biological functions

Authors: Abdellali Kelil, Shengrui Wang, Ryszard Brzezinski

Addresses: Faculty of Sciences, ProspectUS Laboratory, Department of Computer Sciences, University of Sherbrooke, Sherbrooke, J1K 2R1 QC, Canada. ' Faculty of Sciences, ProspectUS Laboratory, Department of Computer Sciences, University of Sherbrooke, Sherbrooke, J1K 2R1 QC, Canada. ' Faculty of Sciences, Microbiology and Biotechnology Laboratory, Department of Biology, University of Sherbrooke, Sherbrooke, J1K 2R1 QC, Canada

Abstract: CLUSS is an algorithm proposed for clustering both alignable and non-alignable protein sequences. However, CLUSS tends to be ineffective on protein datasets that include a large number of biochemical activities. To overcome this difficulty, we propose in this paper a new algorithm, named CLUSS2 that scales better with the increase of the number of biochemical activities. CLUSS2 differs from CLUSS in many ways including protein sequences representation, conserved motifs extraction and time efficiency. Our experiments show that CLUSS2 more accurately highlights the functional characteristics of the clustered families, especially for those with a large number of biochemical activities.

Keywords: clustering; similarity measures; biological functions; non-alignable protein sequences; protein families; protein clusters; motifs extraction; biochemical activities.

DOI: 10.1504/IJCBDD.2008.020190

International Journal of Computational Biology and Drug Design, 2008 Vol.1 No.2, pp.122 - 140

Published online: 08 Sep 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article