A Novel Multi-Modal Network-Based Dynamic Scene Understanding

Md Azher Uddin, Joolekha Bibi Joolee, Young Koo Lee*, Kyung Ah Sohn

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

In recent years, dynamic scene understanding has gained attention from researchers because of its widespread applications. The main important factor in successfully understanding the dynamic scenes lies in jointly representing the appearance and motion features to obtain an informative description. Numerous methods have been introduced to solve dynamic scene recognition problem, nevertheless, a few concerns still need to be investigated. In this article, we introduce a novel multi-modal network for dynamic scene understanding from video data, which captures both spatial appearance and temporal dynamics effectively. Furthermore, two-level joint tuning layers are proposed to integrate the global and local spatial features as well as spatial and temporal stream deep features. In order to extract the temporal information, we present a novel dynamic descriptor, namely, Volume Symmetric Gradient Local Graph Structure (VSGLGS), which generates temporal feature maps similar to optical flow maps. However, this approach overcomes the issues of optical flow maps. Additionally, Volume Local Directional Transition Pattern (VLDTP) based handcrafted spatiotemporal feature descriptor is also introduced, which extracts the directional information through exploiting edge responses. Lastly, a stacked Bidirectional Long Short-Term Memory (Bi-LSTM) network along with a temporal mixed pooling scheme is designed to achieve the dynamic information without noise interference. The extensive experimental investigation proves that the proposed multi-modal network outperforms most of the state-of-The-Art approaches for dynamic scene understanding.

Original languageEnglish
Article number7
JournalACM Transactions on Multimedia Computing, Communications and Applications
Volume18
Issue number1
DOIs
Publication statusPublished - 27 Jan 2022

Keywords

  • Multi-modal network
  • stacked Bi-LSTM network
  • temporal mixed pooling
  • volume local directional transition pattern
  • volume symmetric gradient local graph structure

ASJC Scopus subject areas

  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'A Novel Multi-Modal Network-Based Dynamic Scene Understanding'. Together they form a unique fingerprint.

Cite this