Title: State-of-the-art automatic machine transliteration systems for Indic scripts: a comparative report
Authors: B.S. Sowmya Lakshmi; B.R. Shambhavi
Addresses: Department of Information Science and Engineering, BMS College of Engineering, Bangalore – 560019, Karnataka, India; Visvesvaraya Technological University, Belagavi, Karnataka, India ' Department of Information Science and Engineering, BMS College of Engineering, Bangalore – 560019, Karnataka, India; Visvesvaraya Technological University, Belagavi, Karnataka, India
Abstract: Due to the proliferation of social media and smart phones, the number of internet users has increased on a significant scale. As a result of this globalisation, the internet and its users demand the provision of native languages over the internet. A fair share of the data generated by users on the internet is a combination of native languages and English words in Romanised or their corresponding language scripts. The paper delivers an exhaustive study on machine transliteration systems used over a span of two decades for Indian languages. A review illustrates that traditional machine learning algorithms like support vector machine (SVM), conditional random field (CRF), etc. yield excellent outcomes for strongly associated languages. Whereas, probability-based statistical approaches are best suited when either the source or target language is phonetically rich.
Keywords: NLP; natural language processing; ML; machine learning; Indic languages; automatic forward and backward transliteration; graphemes; phonemes.
DOI: 10.1504/IJAIP.2024.137188
International Journal of Advanced Intelligence Paradigms, 2024 Vol.27 No.2, pp.150 - 177
Received: 11 May 2019
Accepted: 13 Feb 2020
Published online: 05 Mar 2024 *