Title: A hybrid named entity tagger for tagging human proteins/genes

Authors: Kalpana Raja; Suresh Subramani; Jeyakumar Natarajan

Addresses: Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore 641 046, India ' Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore 641 046, India ' Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore 641 046, India

Abstract: The predominant step and pre-requisite in the analysis of scientific literature is the extraction of gene/protein names in biomedical texts. Though many taggers are available for this Named Entity Recognition (NER) task, we found none of them achieve a good state-of-art tagging for human genes/proteins. As most of the current text mining research is related to human literature, a good tagger to precisely tag human genes and proteins is highly desirable. In this paper, we propose a new hybrid approach based on (a) machine learning algorithm (conditional random fields), (b) set of (manually constructed) rules, and (c) a novel abbreviation identification algorithm to surmount the common errors observed in available taggers to tag human genes/proteins. Experiment results on JNLPBA2004 corpus show that our domain specific approach achieves a high precision of 80.47, F-score of 75.77 and outperforms most of the state-of-the-art systems. However, the recall of 71.60 still remains low and leaves much room for future improvement.

Keywords: named entity recognition; hybrid taggers; biomedical text mining; protein tagging; gene tagging; human proteins; human genes; biomedical literature; machine learning; abbreviation identification; bioinformatics.

DOI: 10.1504/IJDMB.2014.064545

International Journal of Data Mining and Bioinformatics, 2014 Vol.10 No.3, pp.315 - 328

Received: 02 Nov 2012
Accepted: 23 Apr 2013

Published online: 21 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article