Article: Study on English machine translation based on feature extraction algorithm and big data information technology Journal: International Journal of Data Science (IJDS) 2025 Vol.10 No.7 pp.224 - 241 Abstract: The proposed intelligent automatic English translation system leverages advanced feature extraction algorithms and big data technologies to enhance translation accuracy and efficiency. Central to this system is an N-Gram-based scoring model, which evaluates translation quality by analysing word sequences. This model is further refined through the development of an English corpus scoring framework, enabling more precise assessments. Incorporating Latent Dirichlet Allocation (LDA), the system employs weighted LDA indices to assess the semantic depth of translations. When these indices are well-aligned, they indicate a translation that captures the nuances and depth of the original text. Conversely, scattered LDA indices suggest a loss of key semantic elements during translation. The integration of behavioural decompression algorithms facilitates the optimisation of translation processes, ensuring that the system delivers high-quality English-Chinese translations by effectively capturing and preserving semantic information. Inderscience Publishers - linking academia, business and industry through research

Title: Study on English machine translation based on feature extraction algorithm and big data information technology

Authors: Zheng Chao; Yixun Lin; Xingzu Zhan

Addresses: School of Big Data and Basic Sciences, Shandong Institute of Petroleum and Chemical Technology, Dongying, 257061, Shandong, China ' School of Economics, Jinan University, Guangzhou, 510000, Guangdong, China; YX Tech Co., Ltd., Guangzhou, 518000, Guangdong, China ' YX Tech Co., Ltd., Guangzhou, 518000, Guangdong, China; College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518000, Guangdong, China

Abstract: The proposed intelligent automatic English translation system leverages advanced feature extraction algorithms and big data technologies to enhance translation accuracy and efficiency. Central to this system is an N-Gram-based scoring model, which evaluates translation quality by analysing word sequences. This model is further refined through the development of an English corpus scoring framework, enabling more precise assessments. Incorporating Latent Dirichlet Allocation (LDA), the system employs weighted LDA indices to assess the semantic depth of translations. When these indices are well-aligned, they indicate a translation that captures the nuances and depth of the original text. Conversely, scattered LDA indices suggest a loss of key semantic elements during translation. The integration of behavioural decompression algorithms facilitates the optimisation of translation processes, ensuring that the system delivers high-quality English-Chinese translations by effectively capturing and preserving semantic information.

Keywords: feature extraction; big data information technology; English- Chinese translation; interactive.

DOI: 10.1504/IJDS.2025.151187

International Journal of Data Science, 2025 Vol.10 No.7, pp.224 - 241

Received: 27 May 2025
Accepted: 23 Jul 2025
Published online: 16 Jan 2026 *

Title: Study on English machine translation based on feature extraction algorithm and big data information technology

Keep up-to-date