Extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics

Helen Hastie, Heriberto Cuayáhuitl, Nina Dethlefs, Simon Keizer, Xingkun Liu

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

In the past 10 years, very few published studies include some kind of extrinsic evaluation of an NLG component in an end-to-end-system, be it for phone or mobile-based dialogues or social robotic interaction. This may be attributed to the fact that these types of evaluations are very costly to set-up and run for a single component. The question therefore arises whether there is anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this article, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These differences can be used to inform further iterations of component and system development.

LanguageEnglish
Title of host publicationDialogues with Social Robots
Subtitle of host publicationEnablements, Analyses, and Evaluation
EditorsKristiina Jokinen, Graham Wilcock
PublisherSpringer
Pages303-311
Number of pages9
VolumePart V
ISBN (Electronic)9789811025853
ISBN (Print)9789811025846
DOIs
Publication statusPublished - 25 Dec 2016
Event7th International Workshop on Spoken Dialogue Systems 2016 - Riekonlinna, Saariselkä, Finland
Duration: 13 Jan 201616 Jan 2016

Publication series

NameLecture Notes in Electrical Engineering
PublisherSpringer
Volume999
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference7th International Workshop on Spoken Dialogue Systems 2016
Abbreviated titleIWSDS 2016
CountryFinland
CitySaariselkä
Period13/01/1616/01/16

Fingerprint

Robotics
Experiments

Keywords

  • Evaluation
  • Natural language generation
  • Spoken dialogue systems

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering

Cite this

Hastie, H., Cuayáhuitl, H., Dethlefs, N., Keizer, S., & Liu, X. (2016). Extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics. In K. Jokinen, & G. Wilcock (Eds.), Dialogues with Social Robots: Enablements, Analyses, and Evaluation (Vol. Part V, pp. 303-311). (Lecture Notes in Electrical Engineering; Vol. 999). Springer. https://doi.org/10.1007/978-981-10-2585-3_24
Hastie, Helen ; Cuayáhuitl, Heriberto ; Dethlefs, Nina ; Keizer, Simon ; Liu, Xingkun. / Extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics. Dialogues with Social Robots: Enablements, Analyses, and Evaluation. editor / Kristiina Jokinen ; Graham Wilcock. Vol. Part V Springer, 2016. pp. 303-311 (Lecture Notes in Electrical Engineering).
@inbook{326065ac221c4cbdba476fa28ce7b785,
title = "Extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics",
abstract = "In the past 10 years, very few published studies include some kind of extrinsic evaluation of an NLG component in an end-to-end-system, be it for phone or mobile-based dialogues or social robotic interaction. This may be attributed to the fact that these types of evaluations are very costly to set-up and run for a single component. The question therefore arises whether there is anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this article, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These differences can be used to inform further iterations of component and system development.",
keywords = "Evaluation, Natural language generation, Spoken dialogue systems",
author = "Helen Hastie and Heriberto Cuay{\'a}huitl and Nina Dethlefs and Simon Keizer and Xingkun Liu",
year = "2016",
month = "12",
day = "25",
doi = "10.1007/978-981-10-2585-3_24",
language = "English",
isbn = "9789811025846",
volume = "Part V",
series = "Lecture Notes in Electrical Engineering",
publisher = "Springer",
pages = "303--311",
editor = "Kristiina Jokinen and Graham Wilcock",
booktitle = "Dialogues with Social Robots",

}

Hastie, H, Cuayáhuitl, H, Dethlefs, N, Keizer, S & Liu, X 2016, Extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics. in K Jokinen & G Wilcock (eds), Dialogues with Social Robots: Enablements, Analyses, and Evaluation. vol. Part V, Lecture Notes in Electrical Engineering, vol. 999, Springer, pp. 303-311, 7th International Workshop on Spoken Dialogue Systems 2016, Saariselkä, Finland, 13/01/16. https://doi.org/10.1007/978-981-10-2585-3_24

Extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics. / Hastie, Helen; Cuayáhuitl, Heriberto; Dethlefs, Nina; Keizer, Simon; Liu, Xingkun.

Dialogues with Social Robots: Enablements, Analyses, and Evaluation. ed. / Kristiina Jokinen; Graham Wilcock. Vol. Part V Springer, 2016. p. 303-311 (Lecture Notes in Electrical Engineering; Vol. 999).

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics

AU - Hastie, Helen

AU - Cuayáhuitl, Heriberto

AU - Dethlefs, Nina

AU - Keizer, Simon

AU - Liu, Xingkun

PY - 2016/12/25

Y1 - 2016/12/25

N2 - In the past 10 years, very few published studies include some kind of extrinsic evaluation of an NLG component in an end-to-end-system, be it for phone or mobile-based dialogues or social robotic interaction. This may be attributed to the fact that these types of evaluations are very costly to set-up and run for a single component. The question therefore arises whether there is anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this article, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These differences can be used to inform further iterations of component and system development.

AB - In the past 10 years, very few published studies include some kind of extrinsic evaluation of an NLG component in an end-to-end-system, be it for phone or mobile-based dialogues or social robotic interaction. This may be attributed to the fact that these types of evaluations are very costly to set-up and run for a single component. The question therefore arises whether there is anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this article, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These differences can be used to inform further iterations of component and system development.

KW - Evaluation

KW - Natural language generation

KW - Spoken dialogue systems

UR - http://www.scopus.com/inward/record.url?scp=85009451959&partnerID=8YFLogxK

U2 - 10.1007/978-981-10-2585-3_24

DO - 10.1007/978-981-10-2585-3_24

M3 - Chapter

SN - 9789811025846

VL - Part V

T3 - Lecture Notes in Electrical Engineering

SP - 303

EP - 311

BT - Dialogues with Social Robots

A2 - Jokinen, Kristiina

A2 - Wilcock, Graham

PB - Springer

ER -

Hastie H, Cuayáhuitl H, Dethlefs N, Keizer S, Liu X. Extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics. In Jokinen K, Wilcock G, editors, Dialogues with Social Robots: Enablements, Analyses, and Evaluation. Vol. Part V. Springer. 2016. p. 303-311. (Lecture Notes in Electrical Engineering). https://doi.org/10.1007/978-981-10-2585-3_24