Title: A machine learning-based system to normalise gene mentions to unique database identifiers

Authors: Yifei Chen; Feng Liu; Bernard Manderick

Addresses: School of Information Sciences, Nanjing Audit University, Nanjing 211815, China; Computational Modeling Lab, Department of Informatics, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium. ' Computational Modeling Lab, Department of Informatics, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium. ' Computational Modeling Lab, Department of Informatics, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium

Abstract: In this paper, we propose an integrated Gene normaliser (GNer) to assign a unique database identifier for each recognised gene mention in biological literature. The GNer combines Support Vector Machines (SVMs) and some rule-base components. First, we construct a dictionary from EntrezGene and BioThesaurus. Then we reduce variations and ambiguities of synonyms based on a designed pre-processor. Finally, a SVM-based disambiguation filter is developed to eliminate the ambiguity of exact matching. From the experimental results, the proposed GNer can achieve a fairly good performance, which can achieve the precision 80.5%, the recall 86.4% and the Fβ>1 measure 83.4.

Keywords: gene normalisation; gene mention recognition; disambiguation; text mining; bioinformatics; machine learning; SVM; support vector machines.

DOI: 10.1504/IJDMB.2011.045415

International Journal of Data Mining and Bioinformatics, 2011 Vol.5 No.6, pp.640 - 660

Received: 24 Jun 2009
Accepted: 12 Mar 2010

Published online: 24 Jan 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article