Context-Aware Multimodal Emotion Recognition

Aaishwarya Khalane*, Talal Shaikh

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)


Making human–computer interaction more organic and personalized for users essentially demands advancement in human emotion recognition. Emotions are perceived by humans considering multiple factors such as facial expressions, voice tonality, and information context. Although significant research has been conducted in the area of unimodal/multimodal emotion recognition in videos using acoustic/visual features, few papers have explored the potential of textual information obtained from the video utterances. Humans experience emotions through their audio-visual and linguistic senses, making it quintessential to take the latter into account. This paper outlines two different algorithms for recognizing multimodal emotional expressions in online videos. In addition to acoustic (speech), visual (facial), and textual (utterances) feature extraction using BERT, we utilize bidirectional LSTMs to capture the context between utterances. To obtain richer sequential information, we also implement a multi-head self-attention mechanism. Our analysis utilizes the benchmarking CMU multimodal opinion sentiment and emotion intensity (CMU-MOSEI) dataset, which is the largest dataset for sentiment analysis and emotion recognition to date. Our experiments result in improved F1 scores in comparison to the baseline models.

Original languageEnglish
Title of host publicationProceedings of International Conference on Information Technology and Applications. ICITA 2021
EditorsAbrar Ullah, Steve Gill, Álvaro Rocha, Sajid Anwar
Number of pages11
ISBN (Electronic)9789811676185
ISBN (Print)9789811676178
Publication statusPublished - 21 Apr 2022
Event15th International Conference on Information Technology and Applications 2021 - Dubai, United Arab Emirates
Duration: 13 Nov 202114 Nov 2021

Publication series

NameLecture Notes in Networks and Systems
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389


Conference15th International Conference on Information Technology and Applications 2021
Abbreviated titleICITA 2021
Country/TerritoryUnited Arab Emirates


  • BERT
  • Context-aware
  • Emotion
  • Multi-head attention
  • Multimodal
  • Recognition

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Computer Networks and Communications


Dive into the research topics of 'Context-Aware Multimodal Emotion Recognition'. Together they form a unique fingerprint.

Cite this