Title: Named entity recognition and classification in biomedical text using classifier ensemble

Authors: Sriparna Saha; Asif Ekbal; Utpal Kumar Sikdar

Addresses: Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, India ' Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, India ' Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, India

Abstract: Named Entity Recognition and Classification (NERC) is an important task in information extraction for biomedicine domain. Biomedical Named Entities include mentions of proteins, genes, DNA, RNA, etc. which, in general, have complex structures and are difficult to recognise. In this paper, we propose a Single Objective Optimisation based classifier ensemble technique using the search capability of Genetic Algorithm (GA) for NERC in biomedical texts. Here, GA is used to quantify the amount of voting for each class in each classifier. We use diverse classification methods like Conditional Random Field and Support Vector Machine to build a number of models depending upon the various representations of the set of features and/or feature templates. The proposed technique is evaluated with two benchmark datasets, namely JNLPBA 2004 and GENETAG. Experiments yield the overall F-measure values of 75.97% and 95.90%, respectively. Comparisons with the existing systems show that our proposed system achieves state-of-the-art performance.

Keywords: biomedical information retrieval; named entity recognition; named entity classification; biomedical texts; single objective optimisation; genetic algorithms; classifier ensemble; data mining; bioinformatics; conditional random field; support vector machines; SVM modelling.

DOI: 10.1504/IJDMB.2015.067954

International Journal of Data Mining and Bioinformatics, 2015 Vol.11 No.4, pp.365 - 391

Received: 20 Aug 2012
Accepted: 21 Feb 2013

Published online: 12 Mar 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article