Abstract
It is a great challenge to perform high level recognition tasks on videos that are poor in quality. In this paper, we propose a new spatio-temporal mid-level (STEM) feature bank for recognizing human actions in low quality videos. The feature bank comprises of a trio of local spatio-temporal features, i.e. shape, motion and textures, which respectively encode structural, dynamic and statistical information in video. These features are encoded into mid-level representations and aggregated to construct STEM. Based on the recent binarized statistical image feature (BSIF), we also design a new spatiotemporal textural feature that extracts discriminately from 3D salient patches. Extensive experiments on the poor quality versions/subsets of the KTH and HMDB51 datasets demonstrate the effectiveness of the proposed approach.
Original language | English |
---|---|
Title of host publication | 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Publisher | IEEE |
Pages | 1846-1850 |
Number of pages | 5 |
ISBN (Electronic) | 9781479999880 |
DOIs | |
Publication status | Published - 19 May 2016 |
Event | 41st IEEE International Conference on Acoustics, Speech and Signal Processing 2016 - Shanghai International Convention Center, Shanghai, China Duration: 20 Mar 2016 → 25 Mar 2016 |
Conference
Conference | 41st IEEE International Conference on Acoustics, Speech and Signal Processing 2016 |
---|---|
Abbreviated title | ICASSP 2016 |
Country/Territory | China |
City | Shanghai |
Period | 20/03/16 → 25/03/16 |
Keywords
- Action recognition
- BSIF
- Low quality video
- Mid-level representation
- Texture features
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering