Title: Improving English-Arabic statistical machine translation with morpho-syntactic and semantic word class

Authors: Ines Turki Khemakhem; Salma Jamoussi; Abdelmajid Ben Hamadou

Addresses: MIRACL Laboratory, University of Sfax, Tunisia ' MIRACL Laboratory, University of Sfax, Tunisia ' MIRACL Laboratory, University of Sfax, Tunisia

Abstract: In this paper, we present a new method for the extraction and integrating of morpho-syntactic and semantic word classes in a statistical machine translation (SMT) context to improve the quality of English-Arabic translation. It can be applied across different statistical machine translations and with languages that have complicated morphological paradigms. In our method, we first identify morpho-syntactic word classes to build up our statistical language model. Then, we apply a semantic word clustering algorithm for English. The obtained semantic word classes are projected from the English side to the featured Arabic side. This projection is based on available word alignment provided by the alignment step using GIZA++ tool. Finally, we apply a new process to incorporate semantic classes in order to improve the SMT quality. We show its efficacy on small and larger English to Arabic translation tasks. The experimental results show that introducing morpho-syntactic and semantic word classes achieves 7.7% of relative improvement on the BLEU score.

Keywords: morpho-syntactic word classes; semantic word classes; alignment; statistical machine translation; SMT.

DOI: 10.1504/IJISTA.2020.107225

International Journal of Intelligent Systems Technologies and Applications, 2020 Vol.19 No.2, pp.172 - 190

Accepted: 18 Sep 2018
Published online: 11 May 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article