Title: Supervised document classification based upon domain-specific term taxonomies

Authors: Francesco Bellomi, Matteo Cristani

Addresses: Dipartimento di Informatica, Universita di Verona, Ca' Vignal 2, Strada Le Grazie 15, I 37134 Verona, Italy. ' Dipartimento di Informatica, Universita di Verona, Ca' Vignal 2, Strada Le Grazie 15, I 37134 Verona, Italy

Abstract: The classification of documents is an interesting topic of recent terminological investigations, in particular the technological ones. Some sophisticated techniques have been developed which provide the classification based upon the recognition of specific linguistic features, such as specific terms or occurrences of phrases. A limited number of cases exist of real document classification applications that make use of natural language processing techniques providing both statistical analysis and human supervision, where the system fully automates the classification process, but the instruction of the taxonomy is a totally human centred activity. In this paper we focus on an application with the above mentioned features; we then introduce a methodology that makes use of this application. The fundamental argument in favour of a specific methodology is that the analysis which leads to the deployment of the term |taxonomy| can be seen as an ontology construction: we also discuss this aspect as a general motivation.

Keywords: document classification; taxonomy; ontology; clustering; statistical natural language processing.

DOI: 10.1504/IJMSO.2006.008768

International Journal of Metadata, Semantics and Ontologies, 2006 Vol.1 No.1, pp.37 - 46

Published online: 23 Jan 2006 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article