Title: Protein family structure signature for multidomain proteins

Authors: Jun Tan; Donald Adjeroh

Addresses: Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA ' Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA

Abstract: The rapid increase in available protein structure datasets requires new techniques for fast, yet, effective analysis of protein 3D structures. In this work, we propose a structure-based signature for protein families, suitable for rapid analysis of multidomain protein structures. Our method is alignment-free, using protein strings as the basic representation. A key novelty is the two-stage approach, whereby an initial list of candidate protein superfamilies are rapidly identified using the protein family signature, and then information retrieval methods are applied only to the members of the candidate superfamilies. This approach is the key to both improved speed, and improved structure retrieval accuracy. Experimental results, including comparative results with state-of-the-art methods, demonstrate the performance of the proposed protein family signature on queries with multidomain protein structures.

Keywords: protein structure; protein structure signature; retrieval; classification; alignment-free; structure analysis.

DOI: 10.1504/IJDMB.2018.094883

International Journal of Data Mining and Bioinformatics, 2018 Vol.20 No.4, pp.285 - 302

Received: 18 May 2018
Accepted: 12 Jun 2018

Published online: 25 Sep 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article