Authors: G. Sudha Sadasivam; K.G. Saranya; K.G. Karrthik
Addresses: Department of Computer Science and Engineering, PSG College of Technology, Coimbatore – 641004, India ' Department of Computer Science and Engineering, PSG College of Technology, Coimbatore – 641004, India ' Department of Computer Science and Engineering, PSG College of Technology, Coimbatore – 641004, India
Abstract: Wikipedia is a free, web-based encyclopaedia. This paper addresses the knowledge integration issue by computing semantic relatedness over a graph derived from Wikipedia by treating the articles as nodes and the links between the articles as the edges. Sentences with highest occurring keywords are extracted. These complex sentences are split into simple sentences and triplets with synonyms are extracted. A hypergraph structure is formed using hypernyms of the keywords to cluster the articles. Hypernyms extracted from the search query and keyword co-occurrences are used to extract relevant articles. Mapping the articles under the hypernyms category to an in-memory structure improves search efficiency and facilitates personalisation. The proposed work ensures the implied relationships between articles in the graph structure and maintenance of semantic relatedness between articles. Further, clustering the articles within the graph structure based on the hypernyms narrows down the search.
Keywords: Wikipedia search; hyper graph; semantics; hypernyms; in-memory; persistent graph; knowledge integration; keywords; synonyms; keyword co-occurrences; information retrieval; search efficiency; personalisation.
International Journal of Web Science, 2013 Vol.2 No.1/2, pp.66 - 79
Received: 05 Sep 2012
Accepted: 29 Oct 2012
Published online: 25 Sep 2013 *