Article: Attention-based word-level contextual feature extraction and cross-modality fusion for sentiment analysis and emotion classification Journal: International Journal of Intelligent Engineering Informatics (IJIEI) 2020 Vol.8 No.1 pp.1 - 18 Abstract: Multimodal affective computing has become a popular research area, due to the availability of a large amount of multimodal content. Feature alignment between the modalities and multimodal fusion are the most important issues in multimodal affective computing. To address these issues, the proposed model extracts the features at word-level and forced alignment is used to understand the time-dependent interaction among the modalities. The contextual information among the words of an utterance and between the nearby utterances is extracted using bidirectional long short term memory (LSTM). Weighted pooling based attention model is used to select the important features within the modalities and importance of each modality. Information from multiple modalities is fused using a cross-modality fusion technique. The performance of the proposed model was tested on two standard datasets such as IEMOCAP and CMU-MOSI. By incorporating the word-level features, feature alignment, and cross-modality fusion, the proposed architecture outperforms the baselines in terms of classification accuracy. Inderscience Publishers - linking academia, business and industry through research

Title: Attention-based word-level contextual feature extraction and cross-modality fusion for sentiment analysis and emotion classification

Authors: Mahesh G. Huddar; Sanjeev S. Sannakki; Vijay S. Rajpurohit

Addresses: Department of Computer Science and Engineering, Hirasugar Institute of Technology, Nidasoshi, Belagavi, 591236, India ' Department of Computer Science and Engineering, Gogte Institute of Technology, Belagavi, 590008, India ' Department of Computer Science and Engineering, Gogte Institute of Technology, Belagavi, 590008, India

Abstract: Multimodal affective computing has become a popular research area, due to the availability of a large amount of multimodal content. Feature alignment between the modalities and multimodal fusion are the most important issues in multimodal affective computing. To address these issues, the proposed model extracts the features at word-level and forced alignment is used to understand the time-dependent interaction among the modalities. The contextual information among the words of an utterance and between the nearby utterances is extracted using bidirectional long short term memory (LSTM). Weighted pooling based attention model is used to select the important features within the modalities and importance of each modality. Information from multiple modalities is fused using a cross-modality fusion technique. The performance of the proposed model was tested on two standard datasets such as IEMOCAP and CMU-MOSI. By incorporating the word-level features, feature alignment, and cross-modality fusion, the proposed architecture outperforms the baselines in terms of classification accuracy.

Keywords: affective computing; attention model; contextual fusion; cross-modality fusion; feature alignment; computer vision; deep learning; bidirectional recurrent neural network; sentiment analysis.

DOI: 10.1504/IJIEI.2020.105430

International Journal of Intelligent Engineering Informatics, 2020 Vol.8 No.1, pp.1 - 18

Received: 11 Jun 2019
Accepted: 23 Aug 2019
Published online: 28 Feb 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Attention-based word-level contextual feature extraction and cross-modality fusion for sentiment analysis and emotion classification

Keep up-to-date