Article: An ensemble of grapheme and phoneme-based models for automatic English to Kannada back-transliteration Journal: International Journal of Intelligence and Sustainable Computing (IJISC) 2021 Vol.1 No.2 pp.138 - 150 Abstract: The task of mapping graphemes or phonemes of one language into phoneme approximations of another language is known as machine transliteration. In this paper, three machine transliteration approaches, 'grapheme-based model', 'phoneme-based model' and 'hybrid model' have been proposed to achieve back transliteration of Romanised Kannada word to its native script Kannada, a resource poor language. A bilingual corpus of around 3 lakh words is built, which comprises of pairs of Romanised Kannada word with its corresponding word in Kannada script. The paradigms are assessed with 3,000 Romanised Kannada test words. Hybrid model achieved better accuracy of 85.93% when compared with other two models. Inderscience Publishers - linking academia, business and industry through research

Title: An ensemble of grapheme and phoneme-based models for automatic English to Kannada back-transliteration

Authors: B.S. Sowmya Lakshmi; B.R. Shambhavi

Addresses: Department of Information Science and Engineering, BMS College of Engineering, Bangalore, India; Affiliated to: Visvesvaraya Technological University, Belagavi, Karnataka, India ' Department of Information Science and Engineering, BMS College of Engineering, Bangalore, India; Affiliated to: Visvesvaraya Technological University, Belagavi, Karnataka, India

Abstract: The task of mapping graphemes or phonemes of one language into phoneme approximations of another language is known as machine transliteration. In this paper, three machine transliteration approaches, 'grapheme-based model', 'phoneme-based model' and 'hybrid model' have been proposed to achieve back transliteration of Romanised Kannada word to its native script Kannada, a resource poor language. A bilingual corpus of around 3 lakh words is built, which comprises of pairs of Romanised Kannada word with its corresponding word in Kannada script. The paradigms are assessed with 3,000 Romanised Kannada test words. Hybrid model achieved better accuracy of 85.93% when compared with other two models.

Keywords: transliteration; bilingual corpus; rule-based approach; statistical approach.

DOI: 10.1504/IJISC.2021.113314

International Journal of Intelligence and Sustainable Computing, 2021 Vol.1 No.2, pp.138 - 150

Received: 04 Dec 2018
Accepted: 17 Jan 2019
Published online: 26 Feb 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: An ensemble of grapheme and phoneme-based models for automatic English to Kannada back-transliteration

Keep up-to-date