Title: Word sense disambiguation in Tamil using Indo-WordNet and cross-language semantic similarity

Authors: Deepa Karuppaiah; P.M. Durai Raj Vincent

Addresses: School of Information Technology and Engineering, Vellore Institute of Technology, Vellore – 632014, India ' School of Information Technology and Engineering, Vellore Institute of Technology, Vellore – 632014, India

Abstract: Word sense disambiguation is the way to compute the correct sense of a word. It is considered as one of the important subtasks in natural language processing, machine translation and information retrieval. WSD found improving the overall performances of these systems. The job of WSD is to eliminate all senses of a word except the appropriate one as per the given context. The work in Tamil linguistics domain for information retrieval or natural language processing is very less. WSD can be performed in supervised and unsupervised manner. Here, we have proposed an unsupervised approach to disambiguate Tamil words in a given context using the context words and their dictionary gloss definitions. We have proposed two variants of our approach. The first approach uses the number of word overlapping between the glosses of context words whereas the second one uses the similarity between the glosses of context words with that of the ambiguous word. The second one found best among the two. For our approach, we have used Tamil Indo-WordNet, Oxford Tamil Dictionary and English WordNet dictionary glosses. Our method achieves better result in recognising correct senses in Tamil text.

Keywords: word sense disambiguation; WSD; natural language processing; Tamil WSD; cross-language similarity; gloss vector measure; Indo-WordNet; information retrieval.

DOI: 10.1504/IJIE.2021.112320

International Journal of Intelligent Enterprise, 2021 Vol.8 No.1, pp.62 - 73

Received: 23 Mar 2019
Accepted: 30 Aug 2019

Published online: 12 Jan 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article