Title: Improving English-Arabic statistical machine translation with morpho-syntactic and semantic word class
Authors: Ines Turki Khemakhem; Salma Jamoussi; Abdelmajid Ben Hamadou
Addresses: MIRACL Laboratory, University of Sfax, Tunisia ' MIRACL Laboratory, University of Sfax, Tunisia ' MIRACL Laboratory, University of Sfax, Tunisia
Abstract: In this paper, we present a new method for the extraction and integrating of morpho-syntactic and semantic word classes in a statistical machine translation (SMT) context to improve the quality of English-Arabic translation. It can be applied across different statistical machine translations and with languages that have complicated morphological paradigms. In our method, we first identify morpho-syntactic word classes to build up our statistical language model. Then, we apply a semantic word clustering algorithm for English. The obtained semantic word classes are projected from the English side to the featured Arabic side. This projection is based on available word alignment provided by the alignment step using GIZA++ tool. Finally, we apply a new process to incorporate semantic classes in order to improve the SMT quality. We show its efficacy on small and larger English to Arabic translation tasks. The experimental results show that introducing morpho-syntactic and semantic word classes achieves 7.7% of relative improvement on the BLEU score.
Keywords: morpho-syntactic word classes; semantic word classes; alignment; statistical machine translation; SMT.
DOI: 10.1504/IJISTA.2020.107225
International Journal of Intelligent Systems Technologies and Applications, 2020 Vol.19 No.2, pp.172 - 190
Accepted: 18 Sep 2018
Published online: 11 May 2020 *