Title: A comprehensive understanding of popular machine translation evaluation metrics
Authors: Md. Adnanul Islam; Md. Saddam Hossain Mukta
Addresses: Department of Computer Science and Engineering, Military Institute of Science and Technology, Dhaka, Bangladesh ' Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
Abstract: Machine translation is one of the pioneer applications of natural language processing and artificial intelligence. Automatic evaluation of the translation performance of the machine translators is one of the most challenging tasks, as manual evaluation of large volumes of document translations is infeasible in practice. Thus, to facilitate the evaluation of translation performance automatically, several metrics have been introduced and utilised widely. Although these translation performance evaluation metrics cannot match the efficiency level of human evaluation, they are popularly employed in automatic evaluation of translation quality of texts across multifarious application domains. This article discusses three such widely used evaluation metrics - BLEU, METEOR, and TER, with relevant details by demonstrating step by step calculations. The main novelty of this article lies in the consideration of several example translations to present and clarify the calculation process of these three of the most popular evaluation metrics for measuring the performance or quality of machine translation. Moreover, the article presents a comparative analysis among these three metrics using two different datasets to reveal their similarities and distinctions in terms of behaviour.
Keywords: evaluation metrics; translation performance; bi-lingual evaluation understudy; BLEU; METEOR; translation edit rate; TER; machine translation.
International Journal of Computational Science and Engineering, 2022 Vol.25 No.5, pp.467 - 478
Received: 15 Mar 2021
Accepted: 02 Oct 2021
Published online: 18 Oct 2022 *