Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis

Sachin Sasidharan Nair, Tanvi Dinkar, Gavin Abercrombie

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)
4 Downloads (Pure)

Abstract

Growing awareness of a ‘Reproducibility Crisis’ in natural language processing (NLP) has focused on human evaluations of generative systems. While labelling for supervised classification tasks makes up a large part of human input to systems, the reproduction of such efforts has thus far not been been explored. In this paper, we re-implement a human data collection study for sentiment analysis of code-mixed Malayalam movie reviews, as well as automated classification experiments. We find that missing and under-specified information makes reproduction challenging, and we observe potentially consequential differences between the original labels and those we collect. Classification results indicate that the reliability of the labels is important for stable performance.
Original languageEnglish
Title of host publicationProceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) at LREC-COLING 2024
EditorsSimone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
PublisherELRA Language Resources Association
Pages114-124
Number of pages11
ISBN (Print)9782493814418
Publication statusPublished - 21 May 2024
Event4th Workshop on Human Evaluation of NLP Systems 2024 - Torino, Italy
Duration: 21 May 2024 → …

Conference

Conference4th Workshop on Human Evaluation of NLP Systems 2024
Abbreviated titleHumEval 2024
Country/TerritoryItaly
CityTorino
Period21/05/24 → …

Keywords

  • Human Data Collection
  • Malayalam
  • Reproducibility
  • Sentiment Analysis

ASJC Scopus subject areas

  • Education
  • Language and Linguistics
  • Library and Information Sciences
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Exploring Reproducibility of Human-Labelled Data for Code-Mixed Sentiment Analysis'. Together they form a unique fingerprint.

Cite this