Title: Study on English machine translation based on feature extraction algorithm and big data information technology
Authors: Zheng Chao; Yixun Lin; Xingzu Zhan
Addresses: School of Big Data and Basic Sciences, Shandong Institute of Petroleum and Chemical Technology, Dongying, 257061, Shandong, China ' School of Economics, Jinan University, Guangzhou, 510000, Guangdong, China; YX Tech Co., Ltd., Guangzhou, 518000, Guangdong, China ' YX Tech Co., Ltd., Guangzhou, 518000, Guangdong, China; College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518000, Guangdong, China
Abstract: The proposed intelligent automatic English translation system leverages advanced feature extraction algorithms and big data technologies to enhance translation accuracy and efficiency. Central to this system is an N-Gram-based scoring model, which evaluates translation quality by analysing word sequences. This model is further refined through the development of an English corpus scoring framework, enabling more precise assessments. Incorporating Latent Dirichlet Allocation (LDA), the system employs weighted LDA indices to assess the semantic depth of translations. When these indices are well-aligned, they indicate a translation that captures the nuances and depth of the original text. Conversely, scattered LDA indices suggest a loss of key semantic elements during translation. The integration of behavioural decompression algorithms facilitates the optimisation of translation processes, ensuring that the system delivers high-quality English-Chinese translations by effectively capturing and preserving semantic information.
Keywords: feature extraction; big data information technology; English- Chinese translation; interactive.
International Journal of Data Science, 2025 Vol.10 No.7, pp.224 - 241
Received: 27 May 2025
Accepted: 23 Jul 2025
Published online: 16 Jan 2026 *


