Abstract
Online Gender-Based Violence is an increasing problem, but existing datasets fail to capture the plurality of possible annotator perspectives or ensure representation of affected groups. In a pilot study, we revisit the annotation of a widely used dataset to investigate the relationship between annotator identities and underlying attitudes and the responses they give to a sexism labelling task. We collect demographic and attitudinal information about crowd-sourced annotators using two validated surveys from Social Psychology. While we do not find any correlation between underlying attitudes and annotation behaviour, ethnicity does appear to be related to annotator responses for this pool of crowd-workers. We also conduct initial classification experiments using Large Language Models, finding that a state-of-the-art model trained with human feedback benefits from our broad data collection to perform better on the new labels. This study represents the initial stages of a wider data collection project, in which we aim to develop a taxonomy of GBV in partnership with affected stakeholders.
Original language | English |
---|---|
Title of host publication | Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) at LREC-COLING 2024 |
Editors | Gavin Abercrombie, Valerio Basile, Davide Bernardi, Shiran Dudy, Simona Frenda, Lucy Havens, Sara Tonelli |
Publisher | European Language Resources Association |
Pages | 31-41 |
Number of pages | 11 |
ISBN (Print) | 9782493814234 |
Publication status | Published - 21 May 2024 |
Event | Joint International Conference on Computational Linguistics, Language Resources and Evaluation 2024 - Lingotto Conference Centre, Torino, Italy Duration: 20 May 2024 → 25 May 2024 https://lrec-coling-2024.org/ |
Conference
Conference | Joint International Conference on Computational Linguistics, Language Resources and Evaluation 2024 |
---|---|
Abbreviated title | LREC-COLING 2024 |
Country/Territory | Italy |
City | Torino |
Period | 20/05/24 → 25/05/24 |
Internet address |
Keywords
- Abusive language
- Annotation
- Gender-Based Violence
- Hate speech
- LLMs
- Misogyny
- Sexism
ASJC Scopus subject areas
- Language and Linguistics
- Education
- Library and Information Sciences
- Linguistics and Language