A hybrid named entity tagger for tagging human proteins/genes
by Kalpana Raja; Suresh Subramani; Jeyakumar Natarajan
International Journal of Data Mining and Bioinformatics (IJDMB), Vol. 10, No. 3, 2014

Abstract: The predominant step and pre-requisite in the analysis of scientific literature is the extraction of gene/protein names in biomedical texts. Though many taggers are available for this Named Entity Recognition (NER) task, we found none of them achieve a good state-of-art tagging for human genes/proteins. As most of the current text mining research is related to human literature, a good tagger to precisely tag human genes and proteins is highly desirable. In this paper, we propose a new hybrid approach based on (a) machine learning algorithm (conditional random fields), (b) set of (manually constructed) rules, and (c) a novel abbreviation identification algorithm to surmount the common errors observed in available taggers to tag human genes/proteins. Experiment results on JNLPBA2004 corpus show that our domain specific approach achieves a high precision of 80.47, F-score of 75.77 and outperforms most of the state-of-the-art systems. However, the recall of 71.60 still remains low and leaves much room for future improvement.

Online publication date: Tue, 21-Oct-2014

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Mining and Bioinformatics (IJDMB):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com