Towards Discrete Object Representations in Vision Transformers with Tensor Products

Wei Yuen Teh, Chern Hong Lim, Mei Kuan Lim, Ian K. T. Tan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

45 Downloads (Pure)

Abstract

In this work, we explore the use of Tensor Product Representations (TPRs) in a Vision Transformer model to form image representations that can later be used for symbolic manipulation in a neurosymbolic model. We propose the Tensor Product Vision Transformer (TP-ViT), an enhancement of a Vision Transformer that incorporates TPRs, an object representation methodology that utilizes filler and role vectors to represent objects. TP-ViT is the first application of TPRs on visual input, and we report qualitative and quantitative results which show that the use of TPRs allows for the formation of more targeted and diverse object representations when compared to a standard Vision Transformer.

Original languageEnglish
Title of host publicationCSAI '23: Proceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence
PublisherAssociation for Computing Machinery
Pages190-194
Number of pages5
ISBN (Electronic)9798400708688
DOIs
Publication statusPublished - 14 Mar 2024
Event7th International Conference on Computer Science and Artificial Intelligence 2023 - Beijing, China
Duration: 8 Dec 202310 Dec 2023

Conference

Conference7th International Conference on Computer Science and Artificial Intelligence 2023
Abbreviated titleCSAI 2023
Country/TerritoryChina
CityBeijing
Period8/12/2310/12/23

Keywords

  • computer vision
  • neurosymbolic AI
  • object representations
  • tensor product representations
  • vision transformer

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Fingerprint

Dive into the research topics of 'Towards Discrete Object Representations in Vision Transformers with Tensor Products'. Together they form a unique fingerprint.

Cite this