Title: English oral pronunciation recognition based on improved deep neural networks
Authors: Lifang Cheng
Addresses: School of Primary Education, Shangrao Preschool Education College, Shangrao, Jiangxi, 334000, China
Abstract: In order to solve the problems of poor performance, low F1-score, and high error rate in traditional English oral pronunciation recognition methods, an English oral pronunciation recognition method based on improved deep neural networks is proposed. Firstly, pre-process the spoken English pronunciation signals and videos to extract audio and lip features. Then, fusion processing is performed on the extracted multimodal features. Finally, the multimodal feature fusion result is used as the output vector, and the English spoken pronunciation recognition result is used as the output vector. By adding an attention module before the first fully connected layer, a deep neural network model is built to obtain the relevant recognition results. The experimental results show that the proposed method has good recognition performance, with an F1-value consistently maintained above 0.95 and an error rate of no more than 1%. It can be further promoted in related fields.
Keywords: spoken English; pronunciation recognition; deep neural network; attention module; multimodal feature fusion.
International Journal of Biometrics, 2026 Vol.18 No.1/2/3, pp.87 - 107
Received: 30 Dec 2024
Accepted: 23 Mar 2025
Published online: 13 Jan 2026 *