Are You Paying Attention? Multimodal Linear Attention Transformers for Affect Prediction in Video Conversations

Jia Qing Poh, John See*, Neamat El Gayar, Lai Kuan Wong

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)
24 Downloads (Pure)

Abstract

The post-COVID-19 era has seen continual adoption and reliance on video-based communication, underscoring the need for unobtrusive affect recognition in digital interactions. This paper proposes an efficient multimodal approach to emotion recognition in video conversational scenarios, leveraging linear attention-based Transformer networks to process both visual and audio cues. We explore various linear attention mechanisms, comparing them with classical self-attention. Using the K-EmoCon dataset, we demonstrate that the proposed approach yields competitive performance in predicting the affective states of conversing persons while significantly improving memory efficiency. Our ablation studies reveal that carefully tuned simple fusion methods can match or exceed more complex approaches. This research contributes to developing more accessible and efficient multimodal emotion recognition systems for video-based conversations, with applications for enhancing remote communication and monitoring digital well-being in the post-pandemic era.
Original languageEnglish
Title of host publicationMRAC '24: Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing
PublisherAssociation for Computing Machinery
Pages15-23
Number of pages9
ISBN (Electronic)9798400712036
DOIs
Publication statusPublished - 28 Oct 2024
Event32nd ACM International Conference on Multimedia 2024 - Melbourne, Australia
Duration: 28 Oct 20241 Nov 2024
Conference number: 32
https://icmsaust.com.au/event/acm-international-conference-for-multimedia-2024/

Conference

Conference32nd ACM International Conference on Multimedia 2024
Abbreviated titleMM '24
Country/TerritoryAustralia
CityMelbourne
Period28/10/241/11/24
Internet address

Keywords

  • multimodal transformers
  • linear attention
  • affect prediction
  • video conversations

Fingerprint

Dive into the research topics of 'Are You Paying Attention? Multimodal Linear Attention Transformers for Affect Prediction in Video Conversations'. Together they form a unique fingerprint.

Cite this