Title: Accurate recognition of emotions of audio-visual bimodal characters based on dual level feature dimensions

Authors: Xiao Zhang

Addresses: Chongqing Youth Vocational and Technical College, Chongqing, 400712, China

Abstract: In order to accurately and quickly recognise the emotions of bimodal characters, a precise emotion recognition method for audio-visual bimodal characters based on dual level feature dimensions is proposed. Firstly, based on audio data, logarithmic transformation and cepstral function are used to extract emotional features from character audio signals. Secondly, by using local binarisation mode and Gabor wavelet transform, emotional feature maps of character videos are extracted. Finally, after cross modal interaction processing of the audio and video features of the character's emotions, a feature fusion model based on gated neural networks is constructed using the visual and acoustic features after interaction as inputs to obtain the final audio-visual bimodal character emotion recognition results. The experimental results show that compared to existing methods, the highest accuracy of character emotion recognition in our method is 0.99, and the longest recognition time does not exceed 10 s.

Keywords: dual level feature dimension; audio and video dual-mode; character emotions; accurate recognition.

DOI: 10.1504/IJBM.2025.143730

International Journal of Biometrics, 2025 Vol.17 No.1/2, pp.202 - 213

Received: 26 Jan 2024
Accepted: 12 Apr 2024

Published online: 06 Jan 2025 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article