Spatio-temporal mid-level feature bank for action recognition in low quality video

Saimunur Rahman, John See

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

It is a great challenge to perform high level recognition tasks on videos that are poor in quality. In this paper, we propose a new spatio-temporal mid-level (STEM) feature bank for recognizing human actions in low quality videos. The feature bank comprises of a trio of local spatio-temporal features, i.e. shape, motion and textures, which respectively encode structural, dynamic and statistical information in video. These features are encoded into mid-level representations and aggregated to construct STEM. Based on the recent binarized statistical image feature (BSIF), we also design a new spatiotemporal textural feature that extracts discriminately from 3D salient patches. Extensive experiments on the poor quality versions/subsets of the KTH and HMDB51 datasets demonstrate the effectiveness of the proposed approach.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherIEEE
Pages1846-1850
Number of pages5
ISBN (Electronic)9781479999880
DOIs
Publication statusPublished - 19 May 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing 2016 - Shanghai International Convention Center, Shanghai, China
Duration: 20 Mar 201625 Mar 2016

Conference

Conference41st IEEE International Conference on Acoustics, Speech and Signal Processing 2016
Abbreviated titleICASSP 2016
Country/TerritoryChina
CityShanghai
Period20/03/1625/03/16

Keywords

  • Action recognition
  • BSIF
  • Low quality video
  • Mid-level representation
  • Texture features

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Spatio-temporal mid-level feature bank for action recognition in low quality video'. Together they form a unique fingerprint.

Cite this