Abstract
In this work, we explore the use of Tensor Product Representations (TPRs) in a Vision Transformer model to form image representations that can later be used for symbolic manipulation in a neurosymbolic model. We propose the Tensor Product Vision Transformer (TP-ViT), an enhancement of a Vision Transformer that incorporates TPRs, an object representation methodology that utilizes filler and role vectors to represent objects. TP-ViT is the first application of TPRs on visual input, and we report qualitative and quantitative results which show that the use of TPRs allows for the formation of more targeted and diverse object representations when compared to a standard Vision Transformer.
Original language | English |
---|---|
Title of host publication | CSAI '23: Proceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence |
Publisher | Association for Computing Machinery |
Pages | 190-194 |
Number of pages | 5 |
ISBN (Electronic) | 9798400708688 |
DOIs | |
Publication status | Published - 14 Mar 2024 |
Event | 7th International Conference on Computer Science and Artificial Intelligence 2023 - Beijing, China Duration: 8 Dec 2023 → 10 Dec 2023 |
Conference
Conference | 7th International Conference on Computer Science and Artificial Intelligence 2023 |
---|---|
Abbreviated title | CSAI 2023 |
Country/Territory | China |
City | Beijing |
Period | 8/12/23 → 10/12/23 |
Keywords
- computer vision
- neurosymbolic AI
- object representations
- tensor product representations
- vision transformer
ASJC Scopus subject areas
- Human-Computer Interaction
- Computer Networks and Communications
- Computer Vision and Pattern Recognition
- Software