Title: Clinical text classification under the Open and Closed Topic Assumptions

Authors: Yutaka Sasaki, Brian Rea, Sophia Ananiadou

Addresses: School of Computer Science, University of Manchester MIB, 131 Princess Street, Manchester, M1 7DN, UK. ' National Centre for Text Mining, School of Computer Science, University of Manchester MIB, 131 Princess Street, Manchester, M1 7DN, UK. ' National Centre for Text Mining, School of Computer Science, University of Manchester MIB, 131 Princess Street, Manchester, M1 7DN, UK

Abstract: This paper investigates multi-topic aspects in automatic classification of clinical free text in comparison with general text. In this paper, we facilitate two different views on multi-topics: the Closed Topic Assumption (CTA) and the Open Topic Assumption (OTA). Experimental results show that the characteristics of multi-topic assignments in the Computational Medicine Centre (CMC) Medical NLP Challenge Data is strongly OTA-oriented but general text Reuters-21578 is characterised in the middle of the OTA and CTA spectrum.

Keywords: text classification; multi-topic assignments; CTA; closed topic assumption; bioinformatics; clinical texts; free text; open topic assumption; OTA.

DOI: 10.1504/IJDMB.2009.026703

International Journal of Data Mining and Bioinformatics, 2009 Vol.3 No.3, pp.299 - 313

Published online: 23 Jun 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article