Article: A method for capturing English oral pronunciation errors based on residual networks and gated convolutional networks Journal: International Journal of Intelligent Systems Technologies and Applications (IJISTA) 2025 Vol.23 No.1/2 pp.133 - 152 Abstract: To address the issues of low classification accuracy and poor capture accuracy in detecting English oral pronunciation errors, this study introduces a novel approach leveraging residual networks and gated convolutional networks. Initially, a multimodal English spoken pronunciation corpus is established by converting annotated corpus data. Subsequently, Mel frequency cepstral coefficients are employed to extract pronunciation features, taking into account the human ear's sensitivity to frequency. A recognition network architecture is then devised, which utilises gated convolutional networks to process continuous video frames, thereby extract spatial and temporal features, and incorporates temporal attention for sequence learning. A classification model is subsequently built upon residual networks, and pronunciation error features are trained, with the capture results being computed via a loss function. Experimental outcomes reveal that the highest detection accuracy of our approach stands at 99.8%, underscoring its high efficacy in capturing English oral pronunciation errors. Inderscience Publishers - linking academia, business and industry through research

Title: A method for capturing English oral pronunciation errors based on residual networks and gated convolutional networks

Authors: Mingxia Jiang; Yao Zhao

Addresses: College of Navigation Technology, Jiangsu Maritime Institute, Nanjing, 211170, China ' Dean's Office, Jiangsu Maritime Institute, Nanjing, 211170, China

Abstract: To address the issues of low classification accuracy and poor capture accuracy in detecting English oral pronunciation errors, this study introduces a novel approach leveraging residual networks and gated convolutional networks. Initially, a multimodal English spoken pronunciation corpus is established by converting annotated corpus data. Subsequently, Mel frequency cepstral coefficients are employed to extract pronunciation features, taking into account the human ear's sensitivity to frequency. A recognition network architecture is then devised, which utilises gated convolutional networks to process continuous video frames, thereby extract spatial and temporal features, and incorporates temporal attention for sequence learning. A classification model is subsequently built upon residual networks, and pronunciation error features are trained, with the capture results being computed via a loss function. Experimental outcomes reveal that the highest detection accuracy of our approach stands at 99.8%, underscoring its high efficacy in capturing English oral pronunciation errors.

Keywords: residual network; gated convolutional network; oral pronunciation; coarse-grained spatial features; timing characteristics.

DOI: 10.1504/IJISTA.2025.145621

International Journal of Intelligent Systems Technologies and Applications, 2025 Vol.23 No.1/2, pp.133 - 152

Received: 19 Aug 2024
Accepted: 14 Oct 2024
Published online: 09 Apr 2025 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: A method for capturing English oral pronunciation errors based on residual networks and gated convolutional networks

Keep up-to-date