Learning Scale-Consistent Attention Part Network for Fine-grained Image Recognition

Huabin Liu, Jianguo Li, Dian Li, John See, Weiyao Lin

Research output: Contribution to journalArticlepeer-review

1 Citation (SciVal)
82 Downloads (Pure)


This paper proposes a Scale-consistent Attention Part Network (SCAPNet) to address that issue, which seamlessly integrates three novel modules: grid gate attention unit (gGAU), scale-consistent attention part selection (SCAPS), and part relation modeling (PRM). The gGAU module represents the grid region at a certain fine-scale with middle layer CNN features and produces hard attention maps with the lightweight Gumbel-Max based gate. The SCAPS module utilizes attention to guide part selection across multi-scales and keep the selection scale-consistent. The PRM module utilizes the self-attention mechanism to build the relationship among parts based on their appearance and relative geo-positions. SCAPNet can be learned in an end-to-end way and demonstrates state-of-the-art accuracy on several publicly available fine-grained recognition datasets (CUB-200-2011, FGVC-Aircraft, Veg200, and Fru92).

Original languageEnglish
JournalIEEE Transactions on Multimedia
Publication statusE-pub ahead of print - 17 Jun 2021


  • attention part
  • fine-grained image recognition
  • Image recognition
  • Location awareness
  • Logic gates
  • Object detection
  • scale-consistent
  • Task analysis
  • Training
  • Visualization

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Computer Science Applications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Learning Scale-Consistent Attention Part Network for Fine-grained Image Recognition'. Together they form a unique fingerprint.

Cite this