Title: ECPF: an efficient algorithm for expanding clustered protein families

Authors: Zhongyang Zuo; Yanheng Liu; Liyan Zhao; Li Xu; Jian Wang; Xiaoyan Lv

Addresses: College of Computer Science and Technology, Jilin University, Changchun 130012, China ' College of Computer Science and Technology, Jilin University, Changchun 130012, China ' The Second Hospital, Jilin University, Changchun 130012, China ' The Second Hospital, Jilin University, Changchun 130012, China ' College of Computer Science and Technology, Jilin University, Changchun 130012, China; Zhuhai Laboratory of Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Department of Computer Science and Technology, Zhuhai College of Jilin University, Zhuhai 519041, China ' The Second Hospital, Jilin University, Changchun 130012, China

Abstract: With the quick development of gene sequencing technology, the explosion age marked by protein sequences has already come. How to deal with a huge number of protein sequences has aroused serious concern in the research field. An effective solution is to cluster homologous sequences into separated protein families. Those proteins that are affiliated to the same protein family share the similar structure and/or the functionality of genes. The known proteins will facilitate to identify various valuable evidences for discovering the unknown proteins. We present an efficient and effective algorithm called Expanding Clustered Protein Families (ECPF), which could skilfully optimise the clustered protein sequences. The results show that ECPF is capable of discovering the unknown connections between storing space and families in large-scale databases while consuming acceptable overhead of computational time. ECPF successfully expands the protein sequence network and furthermore creates a more practical protein sequence topology for promoting biological research.

Keywords: clustering; expanded protein families; protein links; similarity score; protein sequences; bioinformatics.

DOI: 10.1504/IJDMB.2016.082209

International Journal of Data Mining and Bioinformatics, 2016 Vol.16 No.4, pp.312 - 327

Received: 20 Feb 2016
Accepted: 26 Dec 2016

Published online: 12 Feb 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article