Authors: Suraiya Jabin; Niladri Chatterjee; Suos Samak; Kim Sokphyrum; Javier Sola
Addresses: Department of Computer Science, Jamia Millia Islamia, Central University, New Delhi 110025, India ' Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, 110016, India ' Open Institute Organization, P.O. Box 1552, Phnom Penh, Cambodia ' Open Institute Organization, P.O. Box 1552, Phnom Penh, Cambodia ' Open Institute Organization, P.O. Box 1552, Phnom Penh, Cambodia
Abstract: The present paper describes design of an online hybrid machine translation (MT) system involving a low-resource language Khmer, the official language of Cambodia. The proposed system uses an open source statistical machine translation (SMT) toolkit DoMY CE as the primary translation tool. The parallel corpora have been prepared from various sources and the Headley Khmer-English dictionary. Language model, translation model and decoder configurations have been done using the DOMY toolkit. We used a post-processing step of using parts of speech tagger to enhance the quality of target language sentence. Experimental results demonstrate the success of the proposed scheme with English as source and Khmer as the target language. In our experiments the proposed model achieved significantly good National Institute of Standards and Technology (NIST) and BiLingual Evaluation Understudy (BLEU) scores. Different web technologies have been used for developing an online translation system.
Keywords: computational linguistic; English-Khmer parallel corpora; English-Khmer translation; Khmer language; Moses toolkit; statistical machine translation; phrase-based model; rule-based machine translation system; hybrid machine translation system.
International Journal of Intelligent Systems Technologies and Applications, 2018 Vol.17 No.3, pp.292 - 309
Available online: 10 Aug 2018 *Full-text access for editors Access for subscribers Purchase this article Comment on this article