Title: Attention-based word-level contextual feature extraction and cross-modality fusion for sentiment analysis and emotion classification
Authors: Mahesh G. Huddar; Sanjeev S. Sannakki; Vijay S. Rajpurohit
Addresses: Department of Computer Science and Engineering, Hirasugar Institute of Technology, Nidasoshi, Belagavi, 591236, India ' Department of Computer Science and Engineering, Gogte Institute of Technology, Belagavi, 590008, India ' Department of Computer Science and Engineering, Gogte Institute of Technology, Belagavi, 590008, India
Abstract: Multimodal affective computing has become a popular research area, due to the availability of a large amount of multimodal content. Feature alignment between the modalities and multimodal fusion are the most important issues in multimodal affective computing. To address these issues, the proposed model extracts the features at word-level and forced alignment is used to understand the time-dependent interaction among the modalities. The contextual information among the words of an utterance and between the nearby utterances is extracted using bidirectional long short term memory (LSTM). Weighted pooling based attention model is used to select the important features within the modalities and importance of each modality. Information from multiple modalities is fused using a cross-modality fusion technique. The performance of the proposed model was tested on two standard datasets such as IEMOCAP and CMU-MOSI. By incorporating the word-level features, feature alignment, and cross-modality fusion, the proposed architecture outperforms the baselines in terms of classification accuracy.
Keywords: affective computing; attention model; contextual fusion; cross-modality fusion; feature alignment; computer vision; deep learning; bidirectional recurrent neural network; sentiment analysis.
International Journal of Intelligent Engineering Informatics, 2020 Vol.8 No.1, pp.1 - 18
Received: 11 Jun 2019
Accepted: 23 Aug 2019
Published online: 20 Feb 2020 *