Title: Intelligent assessment system for singing skills based on time-frequency feature decoupling
Authors: Ying Zhang; Ruixue Sun; Hongrun Shao; Chunmeng Zhao
Addresses: Arts Department, Qinhuangdao Vocational and Technical College, Qinhuangdao, 066100, China ' Arts Department, Qinhuangdao Vocational and Technical College, Qinhuangdao, 066100, China ' Arts Department, Qinhuangdao Vocational and Technical College, Qinhuangdao, 066100, China ' Arts Department, Qinhuangdao Vocational and Technical College, Qinhuangdao, 066100, China
Abstract: Singing technique assessment is a crucial component in enhancing the quality of music education. To address the issue of insufficient assessment accuracy caused by the coupling of time-frequency features in existing methods, this paper first performs pre-processing on singing audio to extract time-frequency features. Then, by combining deep separable convolutions with dilated convolutions, it simultaneously models frequency and temporal features. Additionally, a residual network is employed to mitigate the gradient vanishing problem in deep network structures. Second, a spatio-temporal enhancement branch is constructed based on a bidirectional long short-term memory (BiLSTM) network. Through a gating mechanism, decoupled features are bidirectionally transmitted between temporal and frequency domains. Decoupled time-frequency feature sequences are then clustered to enable the model to intelligently evaluate singing segments. Experimental results show that the proposed model achieves at least a 4.71% improvement in evaluation accuracy, demonstrating a significant advantage over baseline models.
Keywords: singing technique evaluation; time-frequency feature decoupling; deep separable convolution; bidirectional long short-term memory model; feature clustering.
DOI: 10.1504/IJICT.2025.151079
International Journal of Information and Communication Technology, 2025 Vol.26 No.51, pp.18 - 33
Received: 04 Aug 2025
Accepted: 05 Nov 2025
Published online: 12 Jan 2026 *


