Title: LFXtractor: Text chunking for long form detection from biomedical text

Authors: Min Song, Hongfang Liu

Addresses: Information Systems Department, College of Computing Sciences, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA. ' Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University, 4000 Reservoir Rd, NW, 20057, Washington DC, USA

Abstract: In this paper, we propose a novel method to detect the corresponding long forms (LFs) of short forms (SFs) from biomedical text. The proposed method is differentiated from others as follows: it incorporates lexical analysis techniques into supervised learning for extracting abbreviations; it utilises text-chunking techniques to identify LFs of abbreviations; it significantly improves recall. The experimental results show that our approach outperforms the leading abbreviation algorithms, ExtractAbbrev, ALICE and Acrophile and a collocation-based approach at least by 4.8, 6.0, 9.0 and 6.0%, respectively, in both precision and recall on the Gold Standard Development corpus.

Keywords: text mining; text chunking; abbreviation extraction; biomedical text; supervised learning; abbreviations; long forms.

DOI: 10.1504/IJFIPM.2010.037148

International Journal of Functional Informatics and Personalised Medicine, 2010 Vol.3 No.2, pp.89 - 102

Published online: 29 Nov 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article