Abstract
Human action recognition from low quality video remains a challenging task for the action recognition community. Recent state-of-the-art methods such as space-time interest point (STIP) uses shape and motion features for characterization of action. However, STIP features are over-reliant on video quality and lack robust object semantics. This paper harness the robustness of deeply learned object features from off-the-shelf convolutional neural network (CNN) models to improve action recognition under low quality conditions. A two-channel framework that aggregates shape and motion features extracted using STIP detector, and frame-level object features obtained from the final few layers (i.e., FC6, FC7, softmax layer) of a state-of-the-art image-trained CNN model is proposed. Experimental results on low quality versions of two publicly available datasets—UCF-11 and HMDB51, showed that the use of CNN object features together with conventional shape and motion can greatly improve the performance of action recognition in low quality videos.
Original language | English |
---|---|
Pages (from-to) | 11360-11364 |
Number of pages | 5 |
Journal | Advanced Science Letters |
Volume | 23 |
Issue number | 11 |
DOIs | |
Publication status | Published - 1 Nov 2017 |
Keywords
- Action recognition
- CNN
- Deep learning
- Feature representation
- Low quality video
- STIP
ASJC Scopus subject areas
- General Computer Science
- Health(social science)
- General Mathematics
- Education
- General Environmental Science
- General Engineering
- General Energy