Learning Scale-Consistent Attention Part Network for Fine-grained Image Recognition

Huabin Liu, Jianguo Li, Dian Li, John See, Weiyao Lin

Research output: Contribution to journalArticlepeer-review

27 Citations (Scopus)
259 Downloads (Pure)

Abstract

Discriminative region localization and feature learning are crucial for fine-grained visual recognition. Existing approaches solve this issue by attention mechanism or part based methods while neglecting consistency between attention and local parts, as well as the rich relation information among parts. This paper proposes a Scale-consistent Attention Part Network (SCAPNet) to address that issue, which seamlessly integrates three novel modules: grid gate attention unit (gGAU), scale-consistent attention part selection (SCAPS), and part relation modeling (PRM). The gGAU module represents the grid region at a certain fine-scale with middle layer CNN features and produces hard attention maps with the lightweight Gumbel-Max based gate. The SCAPS module utilizes attention to guide part selection across multi-scales and keep the selection scale-consistent. The PRM module utilizes the self-attention mechanism to build the relationship among parts based on their appearance and relative geo-positions. SCAPNet can be learned in an end-to-end way and demonstrates state-of-the-art accuracy on several publicly available fine-grained recognition datasets (CUB-200-2011, FGVC-Aircraft, Veg200, and Fru92).

Original languageEnglish
Pages (from-to)2902-2913
Number of pages12
JournalIEEE Transactions on Multimedia
Volume24
Early online date17 Jun 2021
DOIs
Publication statusPublished - 2022

Keywords

  • Fine-grained image recognition
  • attention part
  • scale-consistent

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Learning Scale-Consistent Attention Part Network for Fine-grained Image Recognition'. Together they form a unique fingerprint.

Cite this