Cross-modal context-gated convolution for multi-modal sentiment analysis

Authors
Publication date 06-2021
Journal Pattern Recognition Letters
Volume | Issue number 146
Pages (from-to) 252-259
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
When inferring sentiments, using verbal clues only is problematic because of the ambiguity. Adding related vocal and visual contexts as complements for verbal clues can be helpful. To infer sentiments from multi-modal temporal sequences, we need to identify both sentiment-related clues and their cross-modal interactions. However, sentiment-related behaviors of different modalities may not occur at the same time. These behaviors and their interactions are also sparse in time, making it hard to infer the correct sentiments. Besides, unaligned sequences from sensors also have varying sampling rates, which amplify the misalignment and sparsity mentioned above. While most previous multi-modal sentiment analysis works only focus on word-aligned sequences, we propose cross-modal context-gated convolution for unaligned sequences. Cross-modal context-gated convolution captures the local cross-modal interactions, dealing with the misalignment while reducing the effect of unrelated information. Cross-modal context-gated convolution introduces the concept of cross-modal context gate, enabling itself to catch useful cross-modal interactions more effectively. Cross-modal context-gated convolution also brings more possibilities to the layer design for multi-modal sequential modeling. Experiments on multi-modal sentiment analysis datasets under both word-aligned and unaligned conditions show the validity of our approach.
Document type Article
Note With supplementary raw research data.
Language English
Published at https://doi.org/10.1016/j.patrec.2021.03.025
Supplementary materials
Permalink to this page
Back