Deep CNN object features for improved action recognition in low quality videos

Saimunur Rahman, John See*, Chiung Ching Ho

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)


Human action recognition from low quality video remains a challenging task for the action recognition community. Recent state-of-the-art methods such as space-time interest point (STIP) uses shape and motion features for characterization of action. However, STIP features are over-reliant on video quality and lack robust object semantics. This paper harness the robustness of deeply learned object features from off-the-shelf convolutional neural network (CNN) models to improve action recognition under low quality conditions. A two-channel framework that aggregates shape and motion features extracted using STIP detector, and frame-level object features obtained from the final few layers (i.e., FC6, FC7, softmax layer) of a state-of-the-art image-trained CNN model is proposed. Experimental results on low quality versions of two publicly available datasets—UCF-11 and HMDB51, showed that the use of CNN object features together with conventional shape and motion can greatly improve the performance of action recognition in low quality videos.

Original languageEnglish
Pages (from-to)11360-11364
Number of pages5
JournalAdvanced Science Letters
Issue number11
Publication statusPublished - 1 Nov 2017


  • Action recognition
  • CNN
  • Deep learning
  • Feature representation
  • Low quality video
  • STIP

ASJC Scopus subject areas

  • Computer Science(all)
  • Health(social science)
  • Mathematics(all)
  • Education
  • Environmental Science(all)
  • Engineering(all)
  • Energy(all)


Dive into the research topics of 'Deep CNN object features for improved action recognition in low quality videos'. Together they form a unique fingerprint.

Cite this