Title: Syntactic parsing of clause constituents for statistical machine translation

Authors: Jianjun Ma; Jiahuan Pei; Degen Huang; Dingxin Song

Addresses: School of Foreign Languages, Dalian University of Technology, Dalian, China ' School of Computer Science and Technology, Dalian University of Technology, Dalian, China ' School of Computer Science and Technology, Dalian University of Technology, Dalian, China ' School of Computer Science and Technology, Dalian University of Technology, Dalian, China

Abstract: The clause is considered as the basic unit of grammar in linguistics, which is a structure between a chunk and a sentence. Clause constituents, therefore, are one important kind of linguistically valid syntactic phrases. This paper adopts the CRFs model to recognise English clause constituents with their syntactic functions, and testifies their effect on machine translation by applying this syntactic information to an English-Chinese PBSMT system, evaluated on a corpus of business domain. Clause constituents are mainly classified into six kinds: subject, predicate, complement, adjunct, residues of predicate, and residues of complement. Results show that our rich-feature CRFs model achieves an F-measure of 93.31%, a precision of 93.26%, and a recall of 93.04%. This syntactic knowledge in the source language is further combined with the NiuTrans phrasal SMT system, which slightly improves the English-Chinese translation accuracy.

Keywords: syntactic parsing; clause constituents; phrase-based statistical machine translation; PBSMT.

DOI: 10.1504/IJCSE.2018.094424

International Journal of Computational Science and Engineering, 2018 Vol.17 No.1, pp.126 - 132

Received: 30 Jul 2016
Accepted: 21 Aug 2016

Published online: 03 Sep 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article