Title: Sindice.com: a document-oriented lookup index for open linked data

Authors: Eyal Oren, Renaud Delbru, Michele Catasta, Richard Cyganiak, Holger Stenzhorn, Giovanni Tummarello

Addresses: The Network Institute, Vrije Universiteit Amsterdam, The Netherlands. ' Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland. ' Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland. ' Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland. ' Institute of Medical Biometry and Medical Informatics, University Medical Center Freiburg, Stefan-Meier-Str. 26, 79104 Freiburg, Germany. ' Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland

Abstract: Data discovery on the Semantic Web requires crawling and indexing of statements, in addition to the |linked-data| approach of de-referencing resource URIs. Existing Semantic Web search engines are focused on database-like functionality, compromising on index size, query performance and live updates. We present Sindice, a lookup index over Semantic Web resources. Our index allows applications to automatically locate documents containing information about a given resource. In addition, we allow resource retrieval through inverse-functional properties, offer a full-text search and index SPARQL endpoints. Finally, we extend the sitemap protocol to efficiently index large datasets with minimal impact on data providers.

Keywords: semantic web; data discovery; indexing; scalability; lookup index; open linked data; document location; information retrieval; resource retrieval; full-text searching; sitemap protocol; large datasets.

DOI: 10.1504/IJMSO.2008.021204

International Journal of Metadata, Semantics and Ontologies, 2008 Vol.3 No.1, pp.37 - 52

Published online: 10 Nov 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article