Title: Assessing thesaurus-based annotations for semantic search applications

Authors: Kai Eckert, Magnus Pfeffer, Heiner Stuckenschmidt

Addresses: Computer Science Institute, University of Mannheim, A5, 6 68159 Mannheim, Germany. ' Computer Science Institute, University of Mannheim, A5, 6 68159 Mannheim, Germany. ' Computer Science Institute, University of Mannheim, A5, 6 68159 Mannheim, Germany

Abstract: Statistical methods for automated document indexing are becoming an alternative to the manual assignment of keywords. We argue that the quality of the thesaurus used as a basis for indexing in regard to its ability to adequately cover the contents to be indexed and as a basis for the specific indexing method used is of crucial importance in automatic indexing. We present an interactive tool for thesaurus evaluation that is based on a combination of statistical measures and appropriate visualisation techniques that supports the detection of potential problems in a thesaurus. We describe the methods used and show that the tool supports the detection and correction of errors, leading to a better indexing result.

Keywords: content-based retrieval; thesaurus quality; thesaurus evaluation; information content; visualisation; thesaurus-based annotations; semantic search; automated indexing; document indexing; error detection; error correction; document retrieval; information retrieval; statistics.

DOI: 10.1504/IJMSO.2008.021205

International Journal of Metadata, Semantics and Ontologies, 2008 Vol.3 No.1, pp.53 - 67

Published online: 10 Nov 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article