Title: Assisted English oral teaching with mouth recognition technology
Authors: Shaoli Xiong; Rui Cong; Lin Siew Eng; Chunyan Ruan
Addresses: UCSI University, Kuala Lumpur 56000, Malaysia; Dongguan City University, Dongguan 523000, Guangdong, China ' Dongguan City University, Dongguan 523000, Guangdong, China ' UCSI University, Kuala Lumpur, Malaysia ' Dongguan City University, Dongguan 523000, Guangdong, China
Abstract: Deep learning-based speech emotion analysis has been widely applied in English oral teaching. Due to the close relationship between oral pronunciation and mouth shape, mouth recognition has received increasing attention to assist speech recognition. To effectively improve the effectiveness of English oral teaching, this paper constructs an efficient emotion analysis model by introducing human mouth-shape features. Specifically, we first use the Dlib tool to locate 68 facial landmarks and crop mouth-associated landmark information. To improve the extraction of mouth landmark features, we introduce the spatiotemporal graph convolutional network which can effectively mine spatial and temporal features from landmark information. In addition, to effectively model local and global dependencies, we utilise Focal-Transformer for speech feature extraction. To verify the effectiveness of our proposed model, we conducted extensive comparative experiments on two publicly available multimodal sentiment analysis datasets and the self-built English oral teaching dataset. All the experimental results confirm that our proposed model obtains a higher performance compared to other deep models.
Keywords: deep learning; mouth recognition; English oral teaching; multimodal emotion analysis; facial landmarks; focal-transformer.
DOI: 10.1504/IJCAT.2025.148140
International Journal of Computer Applications in Technology, 2025 Vol.76 No.1/2, pp.55 - 64
Received: 27 Oct 2023
Accepted: 14 Jun 2024
Published online: 27 Aug 2025 *