Title: LSB-DSN: sensor-assisted deep learning for robust English speech recognition
Authors: Aili Tang; Dezhi Zeng
Addresses: School of Liberal Education, Liuzhou Railway Vocational Technical College, Liuzhou Guangxi, 545616, China ' Guangxi Taiji Kensijie Information System Consulting Co., Ltd., Liuzhou Guangxi, 545026, China
Abstract: In modern communication, the English speech recognition system is essential for improving the personalised user experience and global communication. The recognition systems use sensor devices and deep learning techniques to ensure the system's robustness in all diverse environments. The traditional system efficiency is reduced due to accents, varying pronunciations, and limited contextual considerations. The paper introduces the lion-swarm boosted deep sesame networks, a new speech recognition framework that fuses sensor technologies and deep learning to improve accuracy and robustness. This model combines acoustic signals with sensor-based inputs from accelerometers, gyroscopes, and electromyography devices to capture the delicate speech-related modulations for better recognition across diversified environments. The hierarchical attention mechanism and lion swarm optimisation enable optimal feature selection, reducing the recognition error and computational overhead. The experiments show that it achieves a 9.5% word error rate in clean conditions, a low cross-entropy loss of 0.65, and 100 ms of processing latency - far superior to baseline models for noisy environments. The proposed framework can adapt to different accents and pronunciations, making it a strong solution for real-world applications in speech recognition.
Keywords: English speech recognition; signal modulation; sensor devices; subnetworks; lion swarm optimisation; LSO; similarity measures; deep sesame networks.
DOI: 10.1504/IJSNET.2025.148452
International Journal of Sensor Networks, 2025 Vol.49 No.1, pp.50 - 68
Received: 27 Feb 2025
Accepted: 11 Mar 2025
Published online: 05 Sep 2025 *