Voices in a Crowd: Searching for clusters of unique perspectives

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Downloads (Pure)

Abstract

Language models have been shown to reproduce underlying biases existing in their training data, which is the majority perspective by default. Proposed solutions aim to capture minority perspectives by either modelling annotator disagreements or grouping annotators based on shared metadata, both of which face significant challenges. We propose a framework that trains models without encoding annotator metadata, extracts latent embeddings informed by annotator behaviour, and creates clusters of similar opinions, that we refer to as voices. Resulting clusters are validated post-hoc via internal and external quantitative metrics, as well a qualitative analysis to identify the type of voice that each cluster represents. Our results demonstrate the strong generalisation capability of our framework, indicated by resulting clusters being adequately robust, while also capturing minority perspectives based on different demographic factors throughout two distinct datasets.
Original languageEnglish
Title of host publicationProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics
Pages12517-12539
Number of pages23
ISBN (Electronic)9798891761643
DOIs
Publication statusPublished - 12 Nov 2024
Event2024 Conference on Empirical Methods in Natural Language Processing - Hybrid, Miami, United States
Duration: 12 Nov 202416 Nov 2024
https://2024.emnlp.org/

Conference

Conference2024 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period12/11/2416/11/24
Internet address

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Voices in a Crowd: Searching for clusters of unique perspectives'. Together they form a unique fingerprint.

Cite this