Article: TSAIE-AppLing: multimodal sentiment analysis of image-enhanced text from a linguistic perspective Journal: International Journal of Information and Communication Technology (IJICT) 2025 Vol.26 No.16 pp.69 - 84 Abstract: To enhance the sentiment correlation between images and texts, this paper proposes a multimodal sentiment analysis approach for image-enhanced text from a linguistic perspective (TSAIE-AppLing). Firstly, bidirectional encoder representations from transformers (BERT) are introduced to encode textual features, and image features are extracted using visual transformer, which is combined with a multi-head self-attention mechanism to capture cross-modal global semantic features. Then we use null convolution to strengthen the feature association between image blocks and aggregate cross-block features, design a multi-head cross-attention mechanism to achieve inter-modal interaction alignment, use graph convolutional network (GCN) to enhance the textual semantic features related to the image, and carry out the final sentiment polarity determination through softmax function. Experimental results on the MVSA dataset show that the proposed method improves the classification accuracy by at least 2.75%, which can significantly improve the multimodal sentiment analysis. Inderscience Publishers - linking academia, business and industry through research

Title: TSAIE-AppLing: multimodal sentiment analysis of image-enhanced text from a linguistic perspective

Authors: Huan Liu

Addresses: School of Literature and Journalism, SiChuan University Jinjiang College, MeiShan, 620000, China

Abstract: To enhance the sentiment correlation between images and texts, this paper proposes a multimodal sentiment analysis approach for image-enhanced text from a linguistic perspective (TSAIE-AppLing). Firstly, bidirectional encoder representations from transformers (BERT) are introduced to encode textual features, and image features are extracted using visual transformer, which is combined with a multi-head self-attention mechanism to capture cross-modal global semantic features. Then we use null convolution to strengthen the feature association between image blocks and aggregate cross-block features, design a multi-head cross-attention mechanism to achieve inter-modal interaction alignment, use graph convolutional network (GCN) to enhance the textual semantic features related to the image, and carry out the final sentiment polarity determination through softmax function. Experimental results on the MVSA dataset show that the proposed method improves the classification accuracy by at least 2.75%, which can significantly improve the multimodal sentiment analysis.

Keywords: multimodal sentiment analysis; BERT model; multi-head cross attention; graph convolutional network.

DOI: 10.1504/IJICT.2025.146378

International Journal of Information and Communication Technology, 2025 Vol.26 No.16, pp.69 - 84

Received: 27 Mar 2025
Accepted: 10 Apr 2025
Published online: 27 May 2025 *

Title: TSAIE-AppLing: multimodal sentiment analysis of image-enhanced text from a linguistic perspective

Keep up-to-date