Title: Hybrid approach for semantic similarity calculation between Tamil words

Authors: Deepa Karuppaiah; P.M. Durai Raj Vincent

Addresses: Vellore Institute of Technology, Vellore, Tamilnadu, India ' Vellore Institute of Technology, Vellore, Tamilnadu, India

Abstract: Semantic similarity, sometimes referred as semantic relatedness, is one of the important concepts that help in various applications that involve natural language processing. In literature, there are plenty of similarity measures to compute the relationship among words in monolingual and cross-lingual documents. They help us in understanding text, finding plagiarism, information retrieval etc. They can be categorised based on the resources used into corpus-based and knowledge-based measures. These measures are plenty for the English language. For the Tamil language, there are hardly any works in calculating the similarity between words. In this paper, we proposed a similarity finding technique that exploits the knowledge from the resources like Tamil Indo WordNet, Tamil Wikitionary and Oxford Tamil Dictionary. We have used the definitions and example sentences of each word that are available through each of these resources for similarity calculation. The proposed approach is evaluated using human evaluated Miller Charles and Rubenstein Goodenough datasets.

Keywords: semantic similarity; Tamil words similarity; Indo WordNet; knowledge-based similarity.

DOI: 10.1504/IJICA.2021.113609

International Journal of Innovative Computing and Applications, 2021 Vol.12 No.1, pp.13 - 23

Received: 20 Sep 2019
Accepted: 26 Nov 2019

Published online: 15 Mar 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article