Abstract
Augmenting Large Language Models (LLMs) with image-understanding capabilities has resulted in a boom of high-performing Vision-Language models (VLMs). While studying the alignment of LLMs to human values has received widespread attention, the safety of VLMs has not received the same attention. In this paper, we explore the impact of jailbreaking on three state-of-the-art VLMs, each using a distinct modeling approach. By comparing each VLM to their respective LLM backbone, we find that each VLM is more susceptible to jailbreaking. We consider this as an undesirable outcome from visual instruction-tuning, which imposes a forgetting effect on an LLM’s safety guardrails. Therefore, we provide recommendations for future work based on evaluation strategies that aim to highlight the weaknesses of a VLM, as well as take safety measures into account during visual instruction tuning.
Original language | English |
---|---|
Title of host publication | Proceedings for the 3rd Workshop on Safety for Conversational AI, Safety4ConvAI 2024 at LREC-COLING 2024 |
Editors | Tanvi Dinkar, Giuseppe Attanasio, Amanda Cercas Curry, Ioannis Konstas, Dirk Hovy, Verena Rieser |
Publisher | European Language Resources Association |
Pages | 40-51 |
Number of pages | 12 |
ISBN (Print) | 9782493814449 |
Publication status | Published - 21 May 2024 |
Event | Joint International Conference on Computational Linguistics, Language Resources and Evaluation 2024 - Lingotto Conference Centre, Torino, Italy Duration: 20 May 2024 → 25 May 2024 https://lrec-coling-2024.org/ |
Conference
Conference | Joint International Conference on Computational Linguistics, Language Resources and Evaluation 2024 |
---|---|
Abbreviated title | LREC-COLING 2024 |
Country/Territory | Italy |
City | Torino |
Period | 20/05/24 → 25/05/24 |
Internet address |
Keywords
- Jailbreak
- Vision-Language Models
- Visual Instruction Tuning
ASJC Scopus subject areas
- Language and Linguistics
- Education
- Library and Information Sciences
- Linguistics and Language