Abstract
This paper proposes a Scale-consistent Attention Part Network (SCAPNet) to address that issue, which seamlessly integrates three novel modules: grid gate attention unit (gGAU), scale-consistent attention part selection (SCAPS), and part relation modeling (PRM). The gGAU module represents the grid region at a certain fine-scale with middle layer CNN features and produces hard attention maps with the lightweight Gumbel-Max based gate. The SCAPS module utilizes attention to guide part selection across multi-scales and keep the selection scale-consistent. The PRM module utilizes the self-attention mechanism to build the relationship among parts based on their appearance and relative geo-positions. SCAPNet can be learned in an end-to-end way and demonstrates state-of-the-art accuracy on several publicly available fine-grained recognition datasets (CUB-200-2011, FGVC-Aircraft, Veg200, and Fru92).
Original language | English |
---|---|
Journal | IEEE Transactions on Multimedia |
DOIs | |
Publication status | E-pub ahead of print - 17 Jun 2021 |
Keywords
- attention part
- fine-grained image recognition
- Image recognition
- Location awareness
- Logic gates
- Object detection
- scale-consistent
- Task analysis
- Training
- Visualization
ASJC Scopus subject areas
- Signal Processing
- Media Technology
- Computer Science Applications
- Electrical and Electronic Engineering