Title: Computational linguistic retrieval framework using negative bootstrapping for retrieving transliteration variants

Authors: Shashi Shekhar; Dilip Kumar Sharma; M.M. Sufyan Beg

Addresses: GLA University, Mathura-281406, India ' GLA University, Mathura-281406, India ' Aligarh Muslim University, Aligarh-202002, India

Abstract: In NLP, one of the imperative and relatively less mature area is transliteration. During transliteration, issues like language identification, script specification arise in mixed script queries. To overwhelm these issues, we propose a new technique called negative bootstrapping with frequent matrix apriori for transliteration. Roman script is widely used in web search query for searching contents. The major challenge that the system face to process transliterated word is because of its existence in more than one form. The experimental evaluation has been done to check transliteration accuracy along with language identification against established methods. The paper offers a high-principled answer to handle multiple scripts used in a document leading to the problems of term matching and committing variations in spelling while searching the contents. The problem is modelled collectively with the deep-learning design and achieves significantly better results when applied to n-gram approach on the benchmark dataset.

Keywords: feature extraction; text categorisation; negative bootstrapping; apriori; transliteration; mulltilexical matching; substitution; variations; word normalisation; natural language processing; NLP; machine learning.

DOI: 10.1504/IJCVR.2020.104358

International Journal of Computational Vision and Robotics, 2020 Vol.10 No.1, pp.79 - 101

Received: 28 Nov 2018
Accepted: 26 Feb 2019

Published online: 06 Jan 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article