Title: Scalable biomedical Named Entity Recognition: investigation of a database-supported SVM approach

Authors: Mona Soliman Habib, Jugal Kalita

Addresses: Cairo Microsoft Innovation Lab, 306 Korniche El-Nile, Maadi Cairo, Egypt. ' Department of Computer Science, University of Colorado, 1420 Austin Bluffs Pkwy, Colorado Springs, CO 80918, USA

Abstract: This paper explores scalability issues associated with the Named Entity Recognition problem in the biomedical publications domain using Support Vector Machines. The performance results using existing binary and multi-class SVMs with increasing training data are compared to results obtained using our new implementations. Our approach eliminates prior language or domain-specific knowledge and achieves good out-of-the-box accuracy measures comparable to those obtained using more complex approaches. The training time of multi-class SVMs is reduced by several orders of magnitude, which would make support vector machines a more viable and practical solution for real-world problems with large datasets.

Keywords: NER; named entity recognition; SVMs; support vector machines; database extension; bioinformatics; biomedical publications; large datasets.

DOI: 10.1504/IJBRA.2010.032121

International Journal of Bioinformatics Research and Applications, 2010 Vol.6 No.2, pp.191 - 208

Published online: 10 Mar 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article