Multigap: Multi-pooled inception network with text augmentation for aesthetic prediction of photographs

Yong-Lian Hii, John See, Magzhan Kairanbay, Lai-Kuan Wong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

34 Citations (Scopus)

Abstract

With the advent of deep learning, convolutional neural networks have solved many imaging problems to a large extent. However, it remains to be seen if the image 'bottleneck' can be unplugged by harnessing complementary sources of data. In this paper, we present a new approach to image aesthetic evaluation that learns both visual and textual features simultaneously. Our network extracts visual features by appending global average pooling blocks on multiple inception modules (MultiGAP), while textual features from associated user comments are learned from a recurrent neural network. Experimental results show that the proposed method is capable of achieving state-of-the-art performance on the AVA / AVA-Comments datasets. We also demonstrate the capability of our approach in visualizing aesthetic activations.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Image Processing (ICIP)
PublisherIEEE
Pages1722-1726
Number of pages5
ISBN (Electronic)9781509021758
DOIs
Publication statusPublished - 22 Feb 2018
Event24th IEEE International Conference on Image Processing 2017 - Beijing, China
Duration: 17 Sept 201720 Sept 2017

Conference

Conference24th IEEE International Conference on Image Processing 2017
Abbreviated titleICIP 2017
Country/TerritoryChina
CityBeijing
Period17/09/1720/09/17

Keywords

  • Aesthetic visualization
  • CNN
  • Deep neural network
  • Image aesthetics evaluation
  • Textual features

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Signal Processing

Fingerprint

Dive into the research topics of 'Multigap: Multi-pooled inception network with text augmentation for aesthetic prediction of photographs'. Together they form a unique fingerprint.

Cite this