Abstract
Recent work showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations. These models are implemented as autoencoders that reconstruct masked patches of rendered text. However, these pixel-based LLMs are limited to discriminative tasks (e.g., classification) and, similar to BERT, cannot be used to generate text. Therefore, they cannot be used for generative tasks such as free-form question answering. In this work, we introduce PIXAR, the first pixel-based autoregressive LLM that performs text generation. Consisting of only a decoder, PIXAR can perform free-form generative tasks while keeping the number of parameters on par with previous encoder-decoder models. Furthermore, we highlight the challenges of generating text as non-noisy images and show this is due to using a maximum likelihood objective. To overcome this problem, we propose an adversarial pretraining stage that improves the readability and accuracy of PIXAR by 8.1 on LAMBADA and 8.5 on bAbI- making it comparable to GPT-2 on text generation tasks. This paves the way to build open-vocabulary LLMs that operate on perceptual input only and calls into question the necessity of the usual symbolic input representation, i.e., text as (sub)tokens. Code is available at https://github.com/april-tools/pixar.
| Original language | English |
|---|---|
| Title of host publication | Findings of the Association for Computational Linguistics ACL 2024 |
| Editors | Lun-Wei Ku, Andre Martins, Vivek Srikumar |
| Publisher | Association for Computational Linguistics |
| Pages | 14673-14695 |
| Number of pages | 23 |
| ISBN (Electronic) | 9798891760998 |
| DOIs | |
| Publication status | Published - Aug 2024 |
| Event | 62nd Annual Meeting of the Association for Computational Linguistics 2024 - Hybrid, Bangkok, Thailand Duration: 11 Aug 2024 → 16 Aug 2024 https://linguistlist.org/issues/35/433/ |
Conference
| Conference | 62nd Annual Meeting of the Association for Computational Linguistics 2024 |
|---|---|
| Abbreviated title | ACL 2024 |
| Country/Territory | Thailand |
| City | Bangkok |
| Period | 11/08/24 → 16/08/24 |
| Internet address |
ASJC Scopus subject areas
- Computer Science Applications
- Linguistics and Language
- Language and Linguistics
Fingerprint
Dive into the research topics of 'PIXAR: Auto-Regressive Language Modeling in Pixel Space'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver