Title: Automatic discovery and ranking of synonyms for search keywords in the web

Authors: K.C. Srikantaiah; M.S. Roopa; N. Krishna Kumar; K.R. Venugopal; L.M. Patnaik

Addresses: Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bangalore, India ' Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bangalore, India ' Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bangalore, India ' Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bangalore, India ' DESE, Indian Institute of Science, Bangalore 560012, India

Abstract: Search engines are an indispensable part of a web user's life. A vast majority of these web users experience difficulties caused by the keyword-based search engines such as inaccurate results for queries and irrelevant URLs even though the given keyword is present in them. Also, relevant URLs may be lost as they may have the synonym of the keyword and not the original one. This condition is known as the polysemy problem. To alleviate these problems, we propose an algorithm called automatic discovery and ranking of synonyms for search keywords in the web (ADRS). The proposed method generates a list of candidate synonyms for individual keywords by employing the relevance factor of the URLs associated with the synonyms. Then, ranking of these candidate synonyms is done using co-occurrence frequencies and various page count-based measures. One of the major advantages of our algorithm is that it is highly scalable which makes it applicable to online data on the dynamic, domain-independent and unstructured World Wide Web. The experimental results show that the best results are obtained using the proposed algorithm with WebJaccard.

Keywords: candidate synonyms; hyperlinks; inbound anchor text; synonym ranking; search engines; similarity measures; automatic discovery; keyword searching; keywords; web search; polysemy problem; URL relevance factors; URLs; information retrieval.

DOI: 10.1504/IJWS.2014.070668

International Journal of Web Science, 2014 Vol.2 No.4, pp.218 - 236

Received: 15 Mar 2014
Accepted: 12 Oct 2014

Published online: 17 Jul 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article