Abstract
Growing awareness of a ‘Reproducibility Crisis’ in natural language processing (NLP) has focused on human evaluations of generative systems. While labelling for supervised classification tasks makes up a large part of human input to systems, the reproduction of such efforts has thus far not been been explored. In this paper, we re-implement a human data collection study for sentiment analysis of code-mixed Malayalam movie reviews, as well as automated classification experiments. We find that missing and under-specified information makes reproduction challenging, and we observe potentially consequential differences between the original labels and those we collect. Classification results indicate that the reliability of the labels is important for stable performance.
Original language | English |
---|---|
Title of host publication | Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) at LREC-COLING 2024 |
Editors | Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson |
Publisher | European Language Resources Association |
Pages | 114-124 |
Number of pages | 11 |
ISBN (Print) | 9782493814418 |
Publication status | Published - 21 May 2024 |
Event | 4th Workshop on Human Evaluation of NLP Systems 2024 - Torino, Italy Duration: 21 May 2024 → … |
Conference
Conference | 4th Workshop on Human Evaluation of NLP Systems 2024 |
---|---|
Abbreviated title | HumEval 2024 |
Country/Territory | Italy |
City | Torino |
Period | 21/05/24 → … |
Keywords
- Human Data Collection
- Malayalam
- Reproducibility
- Sentiment Analysis
ASJC Scopus subject areas
- Education
- Language and Linguistics
- Library and Information Sciences
- Linguistics and Language