TY - GEN
T1 - SemEval-2023 Task 11
T2 - 17th International Workshop on Semantic Evaluation 2023
AU - Leonardelli, Elisa
AU - Abercrombie, Gavin
AU - Almanea, Dina
AU - Basile, Valerio
AU - Fornaciari, Tommaso
AU - Plank, Barbara
AU - Rieser, Verena
AU - Poesio, Alexandra Uma Massimo
N1 - Funding Information:
Elisa Leonardelli was partially supported by the StandByMe European project (REC-RDAP-GBV-AG-2020) on “Stop online violence against women and girls by changing attitudes and behaviour of young people through human rights education” (GA 101005641) and by the StandByMe 2.0 project (CERV-2021-DAPHNE) on “Stop gender-based violence by addressing masculinities and changing behaviour of young people through human rights education” (GA 101049386). Alexandra Uma and Massimo Poesio were partially supported by the DALI project, ERC Advanced Grant 695662. Massimo Poesio was also partially supported by the ARCIDUCA project, EPSRC grant number EP/W001632/1. Gavin Abercrombie and Verena Rieser were supported by the EPSRC projects ‘Gender Bias in Conversational AI’ (EP/T023767/1) and ‘Equally Safe Online’ (EP/W025493/1), and Verena Rieser was also supported by ‘AISEC: AI Secure and Explainable by Construction’ (EP/T026952/1). Valerio Basile was partially supported by the project "Toxic Language Understanding in Online Communication - BREAKhateDOWN" funded by Compag-nia di San Paolo (ex-post 2020). Barbara Plank is partially supported by the DIALECT project, ERC Consolidator Grant 101043235.
Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023/7
Y1 - 2023/7
N2 - The paper contains examples which are offensive in nature. nlp datasets annotated with human judgments are rife with disagreements between the judges. This is especially true for tasks depending on subjective judgments such as sentiment analysis or offensive language detection. Particularly in these latter cases, the nlp community has come to realize that the approach of ‘reconciling’ these different subjective interpretations is inappropriate. Many nlp researchers have therefore concluded that rather than eliminating disagreements from annotated corpora, we should preserve them–indeed, some argue that corpora should aim to preserve all annotator judgments. But this approach to corpus creation for nlp has not yet been widely accepted. The objective of the LeWiDi series of shared tasks is to promote this approach to developing nlp models by providing a unified framework for training and evaluating with such datasets. We report on the second LeWiDi shared task, which differs from the first edition in three crucial respects: (i) it focuses entirely on nlp, instead of both nlp and computer vision tasks in its first edition; (ii) it focuses on subjective tasks, instead of covering different types of disagreements–as training with aggregated labels for subjective nlp tasks is a particularly obvious misrepresentation of the data; and (iii) for the evaluation, we concentrate on soft approaches to evaluation. This second edition of LeWiDi attracted a wide array of participants resulting in 13 shared task submission papers.
AB - The paper contains examples which are offensive in nature. nlp datasets annotated with human judgments are rife with disagreements between the judges. This is especially true for tasks depending on subjective judgments such as sentiment analysis or offensive language detection. Particularly in these latter cases, the nlp community has come to realize that the approach of ‘reconciling’ these different subjective interpretations is inappropriate. Many nlp researchers have therefore concluded that rather than eliminating disagreements from annotated corpora, we should preserve them–indeed, some argue that corpora should aim to preserve all annotator judgments. But this approach to corpus creation for nlp has not yet been widely accepted. The objective of the LeWiDi series of shared tasks is to promote this approach to developing nlp models by providing a unified framework for training and evaluating with such datasets. We report on the second LeWiDi shared task, which differs from the first edition in three crucial respects: (i) it focuses entirely on nlp, instead of both nlp and computer vision tasks in its first edition; (ii) it focuses on subjective tasks, instead of covering different types of disagreements–as training with aggregated labels for subjective nlp tasks is a particularly obvious misrepresentation of the data; and (iii) for the evaluation, we concentrate on soft approaches to evaluation. This second edition of LeWiDi attracted a wide array of participants resulting in 13 shared task submission papers.
UR - http://www.scopus.com/inward/record.url?scp=85175380895&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.semeval-1.314
DO - 10.18653/v1/2023.semeval-1.314
M3 - Conference contribution
AN - SCOPUS:85175380895
SP - 2304
EP - 2318
BT - Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
PB - Association for Computational Linguistics
Y2 - 13 July 2023 through 14 July 2023
ER -