Title: PPI-IRO: a two-stage method for protein-protein interaction extraction based on interaction relation ontology

Authors: Chuan-Xi Li; Peng Chen; Ru-Jing Wang; Xiu-Jie Wang; Ya-Ru Su; Jinyan Li

Addresses: Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, China; School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China ' Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, China; School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China; Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia ' Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, P.R. China; School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China ' State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China ' Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, China; School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China ' Advanced Analytics Institute, University of Technology Sydney, Australia

Abstract: Mining Protein-Protein Interactions (PPIs) from the fast-growing biomedical literature resources has been proven as an effective approach for the identification of biological regulatory networks. This paper presents a novel method based on the idea of Interaction Relation Ontology (IRO), which specifies and organises words of various proteins interaction relationships. Our method is a two-stage PPI extraction method. At first, IRO is applied in a binary classifier to determine whether sentences contain a relation or not. Then, IRO is taken to guide PPI extraction by building sentence dependency parse tree. Comprehensive and quantitative evaluations and detailed analyses are used to demonstrate the significant performance of IRO on relation sentences classification and PPI extraction. Our PPI extraction method yielded a recall of around 80% and 90% and an F1 of around 54% and 66% on corpora of AIMed and BioInfer, respectively, which are superior to most existing extraction methods.

Keywords: protein-protein interaction; PPI extraction; interaction relation ontology; relation words; sentence typed dependency; relation extraction; text mining; information extraction; bioinformatics; biomedical literature; biological regulatory networks; relationship extraction.

DOI: 10.1504/IJDMB.2014.062890

International Journal of Data Mining and Bioinformatics, 2014 Vol.10 No.1, pp.98 - 119

Received: 03 Oct 2011
Accepted: 25 Jan 2012

Published online: 21 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article