Title: A general protein-protein interaction extraction architecture based on word representation and feature selection

Authors: Zhenchao Jiang; Lishuang Li; Degen Huang

Addresses: School of Computer Science and Technology, Dalian University of Technology Dalian, No. 2, Linggong Road, Hi-Tech Zone, Dalian 116024, China ' School of Computer Science and Technology, Dalian University of Technology Dalian, No. 2, Linggong Road, Hi-Tech Zone, Dalian 116024, China ' School of Computer Science and Technology, Dalian University of Technology Dalian, No. 2, Linggong Road, Hi-Tech Zone, Dalian 116024, China

Abstract: Previous researches have shown that supervised Protein-Protein Interaction Extraction (PPIE) can get high accuracies with elaborately selected features and kernels. However, most features and kernels rest upon domain knowledge and natural language analysis, which makes the supervised model expensive, heavy and brittle. Moreover, commonly used representation techniques, such as one-hot encoding and Vector Space Model, fail to capture the semantic similarity between words. To reduce the manual labour and take advantage of semantic representation, we put forward a general instance representation architecture for PPIE, which integrates word representation, vector composition and feature selection. Our method obtains F-scores of 69.7, 78.8, 72.3, 72.0 and 83.7 on AIMed, BioInfer, HPRD50, IEPA and LLL respectively.

Keywords: instance representation; word representation; protein-protein interaction; PPI extraction; relation extraction; biomedical text mining; feature selection; vector composition; semantic similarity; bioinformatics.

DOI: 10.1504/IJDMB.2016.074878

International Journal of Data Mining and Bioinformatics, 2016 Vol.14 No.3, pp.276 - 291

Received: 06 Apr 2015
Accepted: 07 Aug 2015

Published online: 22 Feb 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article