QUEEREOTYPES: A Multi-Source Italian Corpus of Stereotypes towards LGBTQIA+ Community Members

  • Alessandra Teresa Cignarella
  • , Manuela Sanguinetti
  • , Simona Frenda
  • , Andrea Marra
  • , Cristina Bosco
  • , Valerio Basile

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The paper describes a dataset composed of two sub-corpora from two different sources in Italian. The QUEEREOTYPES corpus includes social media texts regarding LGBTQIA+ individuals, behaviors, ideology and events. The texts were collected from Facebook and Twitter in 2018 and were annotated for the presence of stereotypes, and orthogonal dimensions (such as hate speech, aggressiveness, offensiveness, and irony in one sub-corpus, and stance in the other). The resource was developed by Natural Language Processing researchers together with activists from an Italian LGBTQIA+ not-for-profit organization. The creation of the dataset allows the NLP community to study stereotypes against marginalized groups, individuals and, ultimately, to develop proper tools and measures to reduce the online spread of such stereotypes. A test for the robustness of the language resource has been performed by means of 5-fold cross-validation experiments. Finally, text classification experiments have been carried out with a fine-tuned version of AlBERTo (a BERT-based model pre-trained on Italian tweets) and mBERT, obtaining good results on the task of stereotype detection, suggesting that stereotypes towards different targets might share common traits.

Original languageEnglish
Title of host publicationProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
EditorsNicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
PublisherEuropean Language Resources Association (ELRA)
Pages13429-13441
Number of pages13
ISBN (Print)9782493814104
Publication statusPublished - 20 May 2024
EventJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation 2024 - Lingotto Conference Centre, Hybrid, Torino, Italy
Duration: 20 May 202425 May 2024
https://lrec-coling-2024.org/

Publication series

Name Joint International Conference on Computational Linguistics, Language Resources and Evaluation
PublisherACL
ISSN (Print)2522-2686
ISSN (Electronic)2951-2093

Conference

ConferenceJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation 2024
Abbreviated titleLREC-COLING 2024
Country/TerritoryItaly
CityHybrid, Torino
Period20/05/2425/05/24
Internet address

Keywords

  • Corpus
  • Italian
  • LGBTQIA+
  • Stereotypes

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computational Theory and Mathematics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'QUEEREOTYPES: A Multi-Source Italian Corpus of Stereotypes towards LGBTQIA+ Community Members'. Together they form a unique fingerprint.

Cite this