Abstract
The recent surge in deep learning methods across multiple modalities has resulted in an increased interest in image captioning. Most advances in image captioning are still focused on the generation of factual-centric captions, which mainly describe the contents of an image. However, generating captions to provide a meaningful and opinionated critique of photographs is less studied. This paper presents a framework for leveraging aesthetic features encoded from an image aesthetic scorer, to synthesize human-like textual critique via a sequence decoder. Experiments on a large-scale dataset show that the proposed method is capable of producing promising results on relevant metrics relating to semantic diversity and synonymity, with qualitative observations demonstrating likewise. We also suggest the use of Word Mover’s Distance as a semantically intuitive and informative metric for this task.
Original language | English |
---|---|
Title of host publication | 2021 IEEE International Conference on Image Processing |
Publisher | IEEE |
Pages | 2523-2527 |
Number of pages | 5 |
ISBN (Electronic) | 9781665441155 |
DOIs | |
Publication status | Published - 23 Aug 2021 |
Event | 28th IEEE International Conference on Image Processing 2021 - Anchorage, United States Duration: 19 Sept 2021 → 22 Sept 2021 https://www.2021.ieeeicip.org/ |
Conference
Conference | 28th IEEE International Conference on Image Processing 2021 |
---|---|
Abbreviated title | 2021 IEEE ICIP |
Country/Territory | United States |
City | Anchorage |
Period | 19/09/21 → 22/09/21 |
Internet address |
Keywords
- Aesthetic quality assessment
- Encoder-decoder network
- Image captioning
- Text synthesis
- Word mover’s distance
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition
- Signal Processing