Title: State-of-the-art automatic machine transliteration systems for Indic scripts: a comparative report

Authors: B.S. Sowmya Lakshmi; B.R. Shambhavi

Addresses: Department of Information Science and Engineering, BMS College of Engineering, Bangalore – 560019, Karnataka, India; Visvesvaraya Technological University, Belagavi, Karnataka, India ' Department of Information Science and Engineering, BMS College of Engineering, Bangalore – 560019, Karnataka, India; Visvesvaraya Technological University, Belagavi, Karnataka, India

Abstract: Due to the proliferation of social media and smart phones, the number of internet users has increased on a significant scale. As a result of this globalisation, the internet and its users demand the provision of native languages over the internet. A fair share of the data generated by users on the internet is a combination of native languages and English words in Romanised or their corresponding language scripts. The paper delivers an exhaustive study on machine transliteration systems used over a span of two decades for Indian languages. A review illustrates that traditional machine learning algorithms like support vector machine (SVM), conditional random field (CRF), etc. yield excellent outcomes for strongly associated languages. Whereas, probability-based statistical approaches are best suited when either the source or target language is phonetically rich.

Keywords: NLP; natural language processing; ML; machine learning; Indic languages; automatic forward and backward transliteration; graphemes; phonemes.

DOI: 10.1504/IJAIP.2024.137188

International Journal of Advanced Intelligence Paradigms, 2024 Vol.27 No.2, pp.150 - 177

Received: 11 May 2019
Accepted: 13 Feb 2020

Published online: 05 Mar 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article