Title: A method for capturing English oral pronunciation errors based on residual networks and gated convolutional networks
Authors: Mingxia Jiang; Yao Zhao
Addresses: College of Navigation Technology, Jiangsu Maritime Institute, Nanjing, 211170, China ' Dean's Office, Jiangsu Maritime Institute, Nanjing, 211170, China
Abstract: To address the issues of low classification accuracy and poor capture accuracy in detecting English oral pronunciation errors, this study introduces a novel approach leveraging residual networks and gated convolutional networks. Initially, a multimodal English spoken pronunciation corpus is established by converting annotated corpus data. Subsequently, Mel frequency cepstral coefficients are employed to extract pronunciation features, taking into account the human ear's sensitivity to frequency. A recognition network architecture is then devised, which utilises gated convolutional networks to process continuous video frames, thereby extract spatial and temporal features, and incorporates temporal attention for sequence learning. A classification model is subsequently built upon residual networks, and pronunciation error features are trained, with the capture results being computed via a loss function. Experimental outcomes reveal that the highest detection accuracy of our approach stands at 99.8%, underscoring its high efficacy in capturing English oral pronunciation errors.
Keywords: residual network; gated convolutional network; oral pronunciation; coarse-grained spatial features; timing characteristics.
DOI: 10.1504/IJISTA.2025.145621
International Journal of Intelligent Systems Technologies and Applications, 2025 Vol.23 No.1/2, pp.133 - 152
Received: 19 Aug 2024
Accepted: 14 Oct 2024
Published online: 09 Apr 2025 *