Localization guided fight action detection in surveillance videos

Qichao Xu, John See, Weiyao Lin*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Citations (Scopus)


Automatic detection of fight behaviors in surveillance videos is an important task for surveillance systems. In this work, we propose a novel localization guided framework for detecting fight actions in surveillance videos. Specifically, we exploit optical flow maps to extract motion activation information, which indicates the location of active regions. Then, a detection guided alignment module is designed to adjust the localized active regions. This approach employs a two-stream based 3D convolution network as the backbone network with a novel motion acceleration representation on the temporal stream. While most existing methods are still evaluated on three benchmark datasets which were not originally collected from surveillance scenarios, we present a novel Fight Action Detection in Surveillance-videos (FADS) dataset for this purpose. With a total of 1,520 video clips, the FADS is the largest known dataset in terms of number of surveillance videos with fight scenes. Experimental results on both the benchmark datasets and the FADS show that our proposed localization guided method outperforms state-of-the-art techniques.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Multimedia and Expo (ICME)
Number of pages6
ISBN (Electronic)9781538695524
Publication statusPublished - 5 Aug 2019
Event2019 IEEE International Conference on Multimedia and Expo - Shanghai, China
Duration: 8 Jul 201912 Jul 2019


Conference2019 IEEE International Conference on Multimedia and Expo
Abbreviated titleICME 2019


  • Action localization and recognition
  • Fight detection
  • Group behavior analysis
  • Surveillance dataset

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications


Dive into the research topics of 'Localization guided fight action detection in surveillance videos'. Together they form a unique fingerprint.

Cite this