Title: TSAIE-AppLing: multimodal sentiment analysis of image-enhanced text from a linguistic perspective
Authors: Huan Liu
Addresses: School of Literature and Journalism, SiChuan University Jinjiang College, MeiShan, 620000, China
Abstract: To enhance the sentiment correlation between images and texts, this paper proposes a multimodal sentiment analysis approach for image-enhanced text from a linguistic perspective (TSAIE-AppLing). Firstly, bidirectional encoder representations from transformers (BERT) are introduced to encode textual features, and image features are extracted using visual transformer, which is combined with a multi-head self-attention mechanism to capture cross-modal global semantic features. Then we use null convolution to strengthen the feature association between image blocks and aggregate cross-block features, design a multi-head cross-attention mechanism to achieve inter-modal interaction alignment, use graph convolutional network (GCN) to enhance the textual semantic features related to the image, and carry out the final sentiment polarity determination through softmax function. Experimental results on the MVSA dataset show that the proposed method improves the classification accuracy by at least 2.75%, which can significantly improve the multimodal sentiment analysis.
Keywords: multimodal sentiment analysis; BERT model; multi-head cross attention; graph convolutional network.
DOI: 10.1504/IJICT.2025.146378
International Journal of Information and Communication Technology, 2025 Vol.26 No.16, pp.69 - 84
Received: 27 Mar 2025
Accepted: 10 Apr 2025
Published online: 27 May 2025 *