TY - GEN
T1 - Finding action tubes with a sparse-to-dense framework
AU - Li, Yuxi
AU - Lin, Weiyao
AU - Wang, Tao
AU - See, John
AU - Qian, Rui
AU - Xu, Ning
AU - Wang, Limin
AU - Xu, Shugong
N1 - Publisher Copyright:
Copyright 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2020/4/3
Y1 - 2020/4/3
N2 - The task of spatial-temporal action detection has attracted increasing attention among researchers. Existing dominant methods solve this problem by relying on short-term information and dense serial-wise detection on each individual frames or clips. Despite their effectiveness, these methods showed inadequate use of long-term information and are prone to inefficiency. In this paper, we propose for the first time, an efficient framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner. There are two key characteristics in this framework: (1) Both long-term and short-term sampled information are explicitly utilized in our spatiotemporal network, (2) A new dynamic feature sampling module (DTS) is designed to effectively approximate the tube output while keeping the system tractable. We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets, achieving promising results that are competitive to state-of-the-art methods. The proposed sparse-to-dense strategy rendered our framework about 7.6 times more efficient than the nearest competitor.
AB - The task of spatial-temporal action detection has attracted increasing attention among researchers. Existing dominant methods solve this problem by relying on short-term information and dense serial-wise detection on each individual frames or clips. Despite their effectiveness, these methods showed inadequate use of long-term information and are prone to inefficiency. In this paper, we propose for the first time, an efficient framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner. There are two key characteristics in this framework: (1) Both long-term and short-term sampled information are explicitly utilized in our spatiotemporal network, (2) A new dynamic feature sampling module (DTS) is designed to effectively approximate the tube output while keeping the system tractable. We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets, achieving promising results that are competitive to state-of-the-art methods. The proposed sparse-to-dense strategy rendered our framework about 7.6 times more efficient than the nearest competitor.
UR - http://www.scopus.com/inward/record.url?scp=85100369474&partnerID=8YFLogxK
U2 - 10.1609/aaai.v34i07.6811
DO - 10.1609/aaai.v34i07.6811
M3 - Conference contribution
AN - SCOPUS:85100369474
SN - 9781577358350
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 11466
EP - 11473
BT - Proceedings of the AAAI Conference on Artificial Intelligence 2020
PB - AAAI Press
T2 - 34th AAAI Conference on Artificial Intelligence 2020
Y2 - 7 February 2020 through 12 February 2020
ER -