Abstract
Tracking a target of interest in both sparse and crowded environments is a challenging problem, not yet successfully addressed in the literature. In this paper, we propose a new long-term visual tracking algorithm, learning discriminative correlation filters and using an online classifier, to track a target of interest in both sparse and crowded video sequences. First, we learn a translation correlation filter using a multi-layer hybrid of convolutional neural networks (CNN) and traditional hand-crafted features. Second, we include a re-detection module for overcoming tracking failures due to long-term occlusions using online SVM and Gaussian mixture probability hypothesis density (GM-PHD) filter. Finally, we learn a scale correlation filter for estimating the scale of a target by constructing a target pyramid around the estimated or re-detected position using the HOG features. We carry out extensive experiments on both sparse and dense data sets which show that our method significantly outperforms state-of-the-art methods.
Original language | English |
---|---|
Pages (from-to) | 464-476 |
Number of pages | 13 |
Journal | Journal of Visual Communication and Image Representation |
Volume | 55 |
Early online date | 7 Jul 2018 |
DOIs | |
Publication status | Published - Aug 2018 |