Improving English-Arabic statistical machine translation with morpho-syntactic and semantic word class Online publication date: Mon, 11-May-2020
by Ines Turki Khemakhem; Salma Jamoussi; Abdelmajid Ben Hamadou
International Journal of Intelligent Systems Technologies and Applications (IJISTA), Vol. 19, No. 2, 2020
Abstract: In this paper, we present a new method for the extraction and integrating of morpho-syntactic and semantic word classes in a statistical machine translation (SMT) context to improve the quality of English-Arabic translation. It can be applied across different statistical machine translations and with languages that have complicated morphological paradigms. In our method, we first identify morpho-syntactic word classes to build up our statistical language model. Then, we apply a semantic word clustering algorithm for English. The obtained semantic word classes are projected from the English side to the featured Arabic side. This projection is based on available word alignment provided by the alignment step using GIZA++ tool. Finally, we apply a new process to incorporate semantic classes in order to improve the SMT quality. We show its efficacy on small and larger English to Arabic translation tasks. The experimental results show that introducing morpho-syntactic and semantic word classes achieves 7.7% of relative improvement on the BLEU score.
Online publication date: Mon, 11-May-2020
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Intelligent Systems Technologies and Applications (IJISTA):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email email@example.com