Abstract
The increasing popularity of natural language processing has led to a race to improve machine learning models that often leaves aside the core study object, the language itself. In this study, we present classification models designed to detect stereotypes related to immigrants, along with both quantitative and qualitative analyses, shedding light on linguistic distinctions in how humans and various models perceive stereotypes. Given the subjective nature of this task, one of the models incorporates the judgments of all annotators by utilizing soft labels. Through a comparative analysis of BERT-based models using both hard and soft labels, along with predictions from GPT-4, we gain a clearer understanding of the linguistic challenges posed by texts containing stereotypes. Our dataset comprises Spanish Twitter posts collected as responses to immigrant-related hoaxes, annotated with binary values indicating the presence of stereotypes, implicitness, and the requirement for conversational context to understand the stereotype. Our findings suggest that both model prediction confidence and inter-annotator agreement are higher for explicit stereotypes, while stereotypes conveyed through irony and other figures of speech prove more challenging to detect than other implicit stereotypes.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) |
| Editors | Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue |
| Publisher | European Language Resources Association (ELRA) |
| Pages | 8453-8463 |
| Number of pages | 11 |
| ISBN (Print) | 9782493814104 |
| Publication status | Published - 20 May 2024 |
| Event | Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation 2024 - Lingotto Conference Centre, Hybrid, Torino, Italy Duration: 20 May 2024 → 25 May 2024 https://lrec-coling-2024.org/ |
Publication series
| Name | International conference on computational linguistics |
|---|---|
| Publisher | ACL Anthology |
| ISSN (Print) | 2951-2093 |
Conference
| Conference | Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation 2024 |
|---|---|
| Abbreviated title | LREC-COLING 2024 |
| Country/Territory | Italy |
| City | Hybrid, Torino |
| Period | 20/05/24 → 25/05/24 |
| Internet address |
Keywords
- Annotation
- Disagreement
- Immigration
- Stereotype Detection
ASJC Scopus subject areas
- Theoretical Computer Science
- Computational Theory and Mathematics
- Computer Science Applications