Audio-Aided Learning Framework for Image Classification with Limited Training Images

Qi Wu*, Chengjia Wang, Xiaohui Li, Guangxing Wu, Marta Vallejo, Ruixuan Wang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

It is challenging to train a generalizable deep learning classifier with limited training images. Existing few-shot learning approaches try to improve classification performance largely by transferring prior knowledge from upstream large-sample tasks to the current small-sample task. Besides upstream image datasets, prior knowledge may also be obtained from signals of other modalities. In this study, we propose a novel learning framework that can utilize prior knowledge from audio signals to help train an image classifier. In the framework, a pre-trained and fixed audio encoder can transform the audio signal of each class label into a class-specific audio prototype. By attracting image representations to the corresponding audio prototypes during training of the image classifier, within-class image representations become more clustered, while image representations become further apart if they are from different classes. To the best of our knowledge, this is the first work that utilizes audio-based prior knowledge to help train an image classifier with limited training images. The proposed learning framework is compatible with existing learning approaches, making it flexible enough to be combined with existing approaches. Extensive empirical evaluations on both natural and medical image datasets demonstrate that the proposed learning framework significantly outperforms existing methods in image classification with limited training images, thus establishing a new state of the art. The source code will be released publicly.
Original languageEnglish
Title of host publication49th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherIEEE
Pages4975-4979
Number of pages5
ISBN (Electronic)9798350344851
DOIs
Publication statusPublished - 18 Mar 2024
Event49th IEEE International Conference on Acoustics, Speech, and Signal Processing 2024 - COEX, Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024
https://2024.ieeeicassp.org/

Conference

Conference49th IEEE International Conference on Acoustics, Speech, and Signal Processing 2024
Abbreviated titleICASSP 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/2419/04/24
Internet address

Keywords

  • Audio modality
  • Few-shot learning
  • Image classification

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Audio-Aided Learning Framework for Image Classification with Limited Training Images'. Together they form a unique fingerprint.

Cite this