Extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics

Helen Hastie, Heriberto Cuayáhuitl, Nina Dethlefs, Simon Keizer, Xingkun Liu

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

In the past 10 years, very few published studies include some kind of extrinsic evaluation of an NLG component in an end-to-end-system, be it for phone or mobile-based dialogues or social robotic interaction. This may be attributed to the fact that these types of evaluations are very costly to set-up and run for a single component. The question therefore arises whether there is anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this article, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These differences can be used to inform further iterations of component and system development.

Original languageEnglish
Title of host publicationDialogues with Social Robots
Subtitle of host publicationEnablements, Analyses, and Evaluation
EditorsKristiina Jokinen, Graham Wilcock
PublisherSpringer
Pages303-311
Number of pages9
VolumePart V
ISBN (Electronic)9789811025853
ISBN (Print)9789811025846
DOIs
Publication statusPublished - 25 Dec 2016
Event7th International Workshop on Spoken Dialogue Systems 2016 - Riekonlinna, Saariselkä, Finland
Duration: 13 Jan 201616 Jan 2016

Publication series

NameLecture Notes in Electrical Engineering
PublisherSpringer
Volume999
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference7th International Workshop on Spoken Dialogue Systems 2016
Abbreviated titleIWSDS 2016
CountryFinland
CitySaariselkä
Period13/01/1616/01/16

Keywords

  • Evaluation
  • Natural language generation
  • Spoken dialogue systems

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering

Fingerprint Dive into the research topics of 'Extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics'. Together they form a unique fingerprint.

  • Cite this

    Hastie, H., Cuayáhuitl, H., Dethlefs, N., Keizer, S., & Liu, X. (2016). Extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics. In K. Jokinen, & G. Wilcock (Eds.), Dialogues with Social Robots: Enablements, Analyses, and Evaluation (Vol. Part V, pp. 303-311). (Lecture Notes in Electrical Engineering; Vol. 999). Springer. https://doi.org/10.1007/978-981-10-2585-3_24