Article: Intelligent assessment system for singing skills based on time-frequency feature decoupling Journal: International Journal of Information and Communication Technology (IJICT) 2025 Vol.26 No.51 pp.18 - 33 Abstract: Singing technique assessment is a crucial component in enhancing the quality of music education. To address the issue of insufficient assessment accuracy caused by the coupling of time-frequency features in existing methods, this paper first performs pre-processing on singing audio to extract time-frequency features. Then, by combining deep separable convolutions with dilated convolutions, it simultaneously models frequency and temporal features. Additionally, a residual network is employed to mitigate the gradient vanishing problem in deep network structures. Second, a spatio-temporal enhancement branch is constructed based on a bidirectional long short-term memory (BiLSTM) network. Through a gating mechanism, decoupled features are bidirectionally transmitted between temporal and frequency domains. Decoupled time-frequency feature sequences are then clustered to enable the model to intelligently evaluate singing segments. Experimental results show that the proposed model achieves at least a 4.71% improvement in evaluation accuracy, demonstrating a significant advantage over baseline models. Inderscience Publishers - linking academia, business and industry through research

Title: Intelligent assessment system for singing skills based on time-frequency feature decoupling

Authors: Ying Zhang; Ruixue Sun; Hongrun Shao; Chunmeng Zhao

Addresses: Arts Department, Qinhuangdao Vocational and Technical College, Qinhuangdao, 066100, China ' Arts Department, Qinhuangdao Vocational and Technical College, Qinhuangdao, 066100, China ' Arts Department, Qinhuangdao Vocational and Technical College, Qinhuangdao, 066100, China ' Arts Department, Qinhuangdao Vocational and Technical College, Qinhuangdao, 066100, China

Abstract: Singing technique assessment is a crucial component in enhancing the quality of music education. To address the issue of insufficient assessment accuracy caused by the coupling of time-frequency features in existing methods, this paper first performs pre-processing on singing audio to extract time-frequency features. Then, by combining deep separable convolutions with dilated convolutions, it simultaneously models frequency and temporal features. Additionally, a residual network is employed to mitigate the gradient vanishing problem in deep network structures. Second, a spatio-temporal enhancement branch is constructed based on a bidirectional long short-term memory (BiLSTM) network. Through a gating mechanism, decoupled features are bidirectionally transmitted between temporal and frequency domains. Decoupled time-frequency feature sequences are then clustered to enable the model to intelligently evaluate singing segments. Experimental results show that the proposed model achieves at least a 4.71% improvement in evaluation accuracy, demonstrating a significant advantage over baseline models.

Keywords: singing technique evaluation; time-frequency feature decoupling; deep separable convolution; bidirectional long short-term memory model; feature clustering.

DOI: 10.1504/IJICT.2025.151079

International Journal of Information and Communication Technology, 2025 Vol.26 No.51, pp.18 - 33

Received: 04 Aug 2025
Accepted: 05 Nov 2025
Published online: 12 Jan 2026 *

Title: Intelligent assessment system for singing skills based on time-frequency feature decoupling

Keep up-to-date