Multi-scale Spatiotemporal Information Fusion Network for Video Action Recognition

Yutong Cai, Weiyao Lin, John See, Ming-Ming Cheng, Guangcan Liu, Hongkai Xiong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Two-stream convolutional networks have shown excellent performance in video action recognition in recent years. However, it remains unclear how to model the correlation between the temporal and spatial streams more effectively. First, the spatial stream and temporal stream pay attention to different aspects, which can lead to different recognition results. Second, the variety in the length of optical flow fields tends to have a great impact on the classification results. In this paper, we propose a novel multi-scale spatiotemporal information fusion network to fuse the spatial and temporal features. Specifically, our network takes advantage of multi-scale temporal information to better utilize the motion cues. Considering the complementary relationship between the spatial and temporal features, we take the hierarchical fusion strategies and asynchronous fusion method to fuse the two-stream features. Experimental results on two benchmark datasets (UCF101 and HMDB51) show that the proposed network achieves competitive performance.

Original languageEnglish
Title of host publication2018 IEEE Visual Communications and Image Processing (VCIP)
PublisherIEEE
ISBN (Electronic)9781538644584
DOIs
Publication statusPublished - 25 Apr 2019
Event33rd IEEE International Conference on Visual Communications and Image Processing 2018 - Taichung, Taiwan, Province of China
Duration: 9 Dec 201812 Dec 2018

Conference

Conference33rd IEEE International Conference on Visual Communications and Image Processing 2018
Abbreviated titleVCIP 2018
CountryTaiwan, Province of China
CityTaichung
Period9/12/1812/12/18

Keywords

  • Action recognition
  • Asynchronous fusion
  • Convolutional network
  • Hierarchical fusion strategies
  • Multi-scale spatiotemporal information

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Signal Processing

Fingerprint Dive into the research topics of 'Multi-scale Spatiotemporal Information Fusion Network for Video Action Recognition'. Together they form a unique fingerprint.

Cite this