Learning Scale-Consistent Attention Part Network for Fine-grained Image Recognition

Huabin Liu, Jianguo Li, Dian Li, John See, Weiyao Lin

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)
171 Downloads (Pure)


Discriminative region localization and feature learning are crucial for fine-grained visual recognition. Existing approaches solve this issue by attention mechanism or part based methods while neglecting consistency between attention and local parts, as well as the rich relation information among parts. This paper proposes a Scale-consistent Attention Part Network (SCAPNet) to address that issue, which seamlessly integrates three novel modules: grid gate attention unit (gGAU), scale-consistent attention part selection (SCAPS), and part relation modeling (PRM). The gGAU module represents the grid region at a certain fine-scale with middle layer CNN features and produces hard attention maps with the lightweight Gumbel-Max based gate. The SCAPS module utilizes attention to guide part selection across multi-scales and keep the selection scale-consistent. The PRM module utilizes the self-attention mechanism to build the relationship among parts based on their appearance and relative geo-positions. SCAPNet can be learned in an end-to-end way and demonstrates state-of-the-art accuracy on several publicly available fine-grained recognition datasets (CUB-200-2011, FGVC-Aircraft, Veg200, and Fru92).

Original languageEnglish
Pages (from-to)2902-2913
Number of pages12
JournalIEEE Transactions on Multimedia
Early online date17 Jun 2021
Publication statusPublished - 2022


  • Fine-grained image recognition
  • attention part
  • scale-consistent

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Computer Science Applications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Learning Scale-Consistent Attention Part Network for Fine-grained Image Recognition'. Together they form a unique fingerprint.

Cite this