Title: Statistical machine translation system for English to Urdu

Authors: Shahnawaz; R.B. Mishra

Addresses: Department of Computer Engineering, Indian Institute of Technology, Banaras Hindu University, IIT (BHU), Varanasi-221005 U.P., India ' Department of Computer Engineering, Indian Institute of Technology, Banaras Hindu University, IIT (BHU), Varanasi-221005 U.P., India

Abstract: English and Urdu, both languages, belong to different language families and follow different grammatical structure. If the source and target languages differ in linguistic features, mainly structure of the sentences as is the case with English and Urdu languages, the problem of machine translation becomes more challenging. Urdu is a morphologically rich language. Factored translation model handles such languages in target side by integrating linguistic features with the words. In factored corpus which we have created for factored translation model, superficial form of the word is factorised with factors like lemma and POS tag. We have presented a system model for English to Urdu machine translation which uses GIZA++, SRILM and Moses. Moses is used for decoding and training factored translation model by minimum error rate training. We have calculated MT evaluation score for translation output obtained from the system using n-gram BLEU score, precision, recall, F-measure and METEOR.

Keywords: English; Urdu; statistical machine translation; factored translation model; language translation.

DOI: 10.1504/IJAIP.2013.056421

International Journal of Advanced Intelligence Paradigms, 2013 Vol.5 No.3, pp.182 - 203

Received: 26 Jul 2012
Accepted: 09 Nov 2012

Published online: 30 Jul 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article