Title: Protein family structure signature for multidomain proteins
Authors: Jun Tan; Donald Adjeroh
Addresses: Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA ' Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA
Abstract: The rapid increase in available protein structure datasets requires new techniques for fast, yet, effective analysis of protein 3D structures. In this work, we propose a structure-based signature for protein families, suitable for rapid analysis of multidomain protein structures. Our method is alignment-free, using protein strings as the basic representation. A key novelty is the two-stage approach, whereby an initial list of candidate protein superfamilies are rapidly identified using the protein family signature, and then information retrieval methods are applied only to the members of the candidate superfamilies. This approach is the key to both improved speed, and improved structure retrieval accuracy. Experimental results, including comparative results with state-of-the-art methods, demonstrate the performance of the proposed protein family signature on queries with multidomain protein structures.
Keywords: protein structure; protein structure signature; retrieval; classification; alignment-free; structure analysis.
DOI: 10.1504/IJDMB.2018.094883
International Journal of Data Mining and Bioinformatics, 2018 Vol.20 No.4, pp.285 - 302
Received: 18 May 2018
Accepted: 12 Jun 2018
Published online: 25 Sep 2018 *