Title: Improving protein-protein interaction article classification using biological domain knowledge

Authors: Yifei Chen; Hongjian Guo; Feng Liu; Bernard Manderick

Addresses: School of Information Science, Nanjing Audit University, 86 Yushan Rd (W), 211815, Nanjing, China ' School of Information Science, Nanjing Audit University, 86 Yushan Rd (W), 211815, Nanjing, China ' Computational Modeling Lab, Vrije Universiteit Brussel, Pleinlaan 2, B-1050, Brussels, Belgium ' Computational Modeling Lab, Vrije Universiteit Brussel, Pleinlaan 2, B-1050, Brussels, Belgium

Abstract: Interaction Article Classification (IAC) is a specific text classification application in biological domain that tries to find out which articles describe Protein-Protein Interactions (PPIs) to help extract PPIs from biological literature more efficiently. However, the existing text representation and feature weighting schemes commonly used for text classification are not well suited for IAC. We capture and utilise biological domain knowledge, i.e. gene mentions also known as protein or gene names in the articles, to address the problem. We put forward a new gene mention order-based approach that highlights the important role of gene mentions to represent the texts. Furthermore, we also incorporate the information concerning gene mentions into a novel feature weighting scheme called Gene Mention-based Term Frequency (GMTF). By conducting experiments, we show that using the proposed representation and weighting schemes, our Interaction Article Classifier (IACer) performs better than other leading systems for the moment.

Keywords: text classification; protein-protein interaction; feature weighting; biological domain knowledge; data mining; PPI article classification; biological literature; PPI extraction; gene mentions; term frequency; bioinformatics.

DOI: 10.1504/IJDMB.2015.069415

International Journal of Data Mining and Bioinformatics, 2015 Vol.12 No.2, pp.144 - 166

Received: 02 Nov 2012
Accepted: 03 Jun 2013

Published online: 15 May 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article