Title: A Wikipedia-based approach to conceptual indexing and retrieval of documents

Authors: Carlo Abi Chahine; Nathalie Chaignaud; Jean-Philippe Kotowicz; Jean-Pierre Pecuchet

Addresses: INSA Rouen LITIS – EA 4108, BP08, 76801 Saint-Etienne du Rouvray, France ' INSA Rouen LITIS – EA 4108, BP08, 76801 Saint-Etienne du Rouvray, France ' INSA Rouen LITIS – EA 4108, BP08, 76801 Saint-Etienne du Rouvray, France ' INSA Rouen LITIS – EA 4108, BP08, 76801 Saint-Etienne du Rouvray, France

Abstract: This paper describes a support system helping archivists in indexing and retrieving documents. Our method is based on the Wikipedia category network as a conceptual taxonomy. A directed acyclic graph (DAG) is built for each document by mapping terms (one or more words) to a concept in the Wikipedia category network. Properties of the graph are used to weight these concepts. According to the so-called important concepts, topics and keywords are proposed. Conceptual indexing consists in finding the relevant Wikipedia papers and categories, which can be used to describe the text. Conceptual retrieval consists in using these papers and categories to return the relevant documents for a user query. Finally, a proof-of-concept prototype is presented.

Keywords: document indexing; document retrieval; similarity measures; knowledge representation; Wikipedia; archivists; category network; conceptual taxonomy; directed acyclic graphs; DAG; conceptual indexing; information retrieval.

DOI: 10.1504/IJKL.2014.067172

International Journal of Knowledge and Learning, 2014 Vol.9 No.1/2, pp.87 - 103

Received: 12 Jun 2013
Accepted: 16 Jun 2014

Published online: 31 Jan 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article