Task-Adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition

Huabin Liu, Weixian Lv, John See, Weiyao Lin*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

A primary challenge faced in few-shot action recognition is inadequate video data for training. To address this issue, current methods in this field mainly focus on devising algorithms at the feature level while little attention is paid to processing input video data. Moreover, existing frame sampling strategies may omit critical action information in temporal and spatial dimensions, which further impacts video utilization efficiency. In this paper, we propose a novel video frame sampler for few-shot action recognition to address this issue, where task-specific spatial-Temporal frame sampling is achieved via a temporal selector (TS) and a spatial amplifier (SA). Specifically, our sampler first scans the whole video at a small computational cost to obtain a global perception of video frames. The TS plays its role in selecting top-T frames that contribute most significantly and subsequently. The SA emphasizes the discriminative information of each frame by amplifying critical regions with the guidance of saliency maps. We further adopt task-Adaptive learning to dynamically adjust the sampling strategy according to the episode task at hand. Both the implementations of TS and SA are differentiable for end-To-end optimization, facilitating seamless integration of our proposed sampler with most few-shot action recognition methods. Extensive experiments show a significant boost in the performances on various benchmarks including long-Term videos.

Original languageEnglish
Title of host publicationMM '22: Proceedings of the 30th ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery
Pages6230-6240
Number of pages11
ISBN (Electronic)9781450392037
DOIs
Publication statusPublished - 10 Oct 2022
Event30th ACM International Conference on Multimedia 2022 - Lisbon, Portugal
Duration: 10 Oct 202214 Oct 2022

Conference

Conference30th ACM International Conference on Multimedia 2022
Abbreviated titleMM 2022
Country/TerritoryPortugal
CityLisbon
Period10/10/2214/10/22

Keywords

  • few-shot action recognition
  • spatial-temporal sampler

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Task-Adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition'. Together they form a unique fingerprint.

Cite this