Title: Image-text multimodal sentiment analysis method integrating multi-themes and multi-labels
Authors: Shunxiang Zhang; Longhui Hu; Shuyu Li; Wenjie Duan; Xiaolong Wang
Addresses: School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, 232001, China; School of Artificial Intelligence, Anhui University of Science and Technology, Huainan, 232001, China ' School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, 232001, China; School of Artificial Intelligence, Anhui University of Science and Technology, Huainan, 232001, China ' School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, 232001, China; School of Artificial Intelligence, Anhui University of Science and Technology, Huainan, 232001, China ' School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, 232001, China; School of Artificial Intelligence, Anhui University of Science and Technology, Huainan, 232001, China ' School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, 232001, China; School of Artificial Intelligence, Anhui University of Science and Technology, Huainan, 232001, China
Abstract: The current image-text sentiment analysis models only focus on the content of text and image, and ignore the synergistic effect of themes and labels information on the semantic features of image and text. Therefore, we propose a multi-modal sentiment analysis method integrating multi-themes and multi-labels. Firstly, the global features and local features of the image are obtained by CNN and Faster-RCNN. Bi-LSTM is used to obtain word-level features and sentence-level features of the text, and the Bert is responsible for extracting the theme-label features. Then, the attention network for feature interaction to generate the word-local correlation features, and the text's sentence features are combined with the image's global features to generate the joint features of the image-sentence. Finally, these two features are fused with the theme-label features to obtain the results of the sentiment analysis. The experimental results demonstrate that the proposed method can improve the accuracy of image-text sentiment analysis.
Keywords: multimodal sentiment analysis; multi-theme labels; modal fusion; target detection; attention network.
DOI: 10.1504/IJCSE.2025.146071
International Journal of Computational Science and Engineering, 2025 Vol.28 No.3, pp.292 - 302
Received: 03 Mar 2024
Accepted: 23 May 2024
Published online: 06 May 2025 *