Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering

Research output: Chapter in Book/Report/Conference proceedingConference contribution

53 Downloads (Pure)

Abstract

This paper evaluates the extent to which current LLMs can capture task-oriented multi-party conversations (MPCs). We have recorded and transcribed 29 MPCs between patients, their companions, and a social robot in a hospital. We then annotated this corpus for multi-party goal-tracking and intent-slot recognition. People share goals, answer each other’s goals, and provide other people’s goals in MPCs - none of which occur in dyadic interactions. To understand user goals in MPCs, we compared three methods in zero-shot and few-shot settings: we fine-tuned T5, created pre-training tasks to train DialogLM using LED, and employed prompt engineering techniques with GPT-3.5-turbo, to determine which approach can complete this novel task with limited data. GPT-3.5-turbo significantly outperformed the others in a few-shot setting. The ‘reasoning’ style prompt, when given 7% of the corpus as example annotated conversations, was the best performing method. It correctly annotated 62.32% of the goal tracking MPCs, and 69.57% of the intent-slot recognition MPCs. A ‘story’ style prompt increased model hallucination, which could be detrimental if deployed in safety-critical settings. We conclude that multi-party conversations still challenge state-of-the-art LLMs.
Original languageEnglish
Title of host publicationProceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Place of PublicationPrague, Czechia
PublisherAssociation for Computational Linguistics
Pages229–241
Number of pages13
ISBN (Electronic)9798891760288
DOIs
Publication statusPublished - Sept 2023

Keywords

  • cs.CL
  • cs.HC

Fingerprint

Dive into the research topics of 'Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering'. Together they form a unique fingerprint.

Cite this