Title: Relevance measures for XML information retrieval

Authors: Olli Luoma

Addresses: Department of Information Technology, University of Turku, FIN-20014, Finland

Abstract: In recent years, a lot of work has been carried out to develop efficient methods for storing and querying XML data. Most of the proposals have approached the subject from the database point of view, i.e., they have primarily aimed at providing exact matching capabilities. The problem can, however, also be addressed as an information-retrieval problem, which obviously introduces some challenges, such as the need for relevance ranking. The vast majority of the previous proposals have based the ranking primarily on content and, furthermore, if structural properties were taken into account, only containment relationships have been considered. In this paper, we focus on ranking the results based on their structural properties and aim at supporting a wide range of structural operations, such as operations based on preceding/following relationships. Our method is based on a fuzzy interpretation of the XPath query language which is also discussed in this paper. Finally, we discuss a relational implementation of our model and present the results of our experiments.

Keywords: information retrieval; semistructured documents; XML data; relevance ranking; XPath query language.

DOI: 10.1504/IJWGS.2007.014073

International Journal of Web and Grid Services, 2007 Vol.3 No.2, pp.170 - 193

Published online: 17 Jun 2007 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article