Crowd-sourcing NLG Data: Pictures Elicit Better Data

Jekaterina Novikova, Oliver Lemon, Verena Rieser

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances. In this work, we propose a novel framework for crowd-sourcing high quality NLG training data, using automatic quality control measures and evaluating different MRs with which to elicit
data. We show that pictorial MRs result in better NL data being collected than logic-based MRs: utterances elicited by pictorial MRs are judged as significantly more natural, more informative, and better phrased, with a significant increase in average quality ratings (around 0.5 points on a 6-point scale), compared to using the logical MRs. As the MR becomes more complex, the benefits of pictorial stimuli increase. The collected data will be released as part of this submission.
Original languageEnglish
Title of host publicationProceedings of the 9th International Natural Language Generation conference
PublisherAssociation for Computational Linguistics
Pages265-273
Publication statusPublished - 2016
Event9th International Natural Language Generation Conference - University of Edinburgh building at 50 George Square , Edinburgh, United Kingdom
Duration: 5 Sep 20168 Sep 2016
http://www.macs.hw.ac.uk/InteractionLab/INLG2016/index.html
http://www.macs.hw.ac.uk/InteractionLab/INLG2016/#

Conference

Conference9th International Natural Language Generation Conference
Abbreviated titleINLG 2016
CountryUnited Kingdom
CityEdinburgh
Period5/09/168/09/16
Internet address

Fingerprint

Quality control

Cite this

Novikova, J., Lemon, O., & Rieser, V. (2016). Crowd-sourcing NLG Data: Pictures Elicit Better Data. In Proceedings of the 9th International Natural Language Generation conference (pp. 265-273). Association for Computational Linguistics.
Novikova, Jekaterina ; Lemon, Oliver ; Rieser, Verena. / Crowd-sourcing NLG Data: Pictures Elicit Better Data. Proceedings of the 9th International Natural Language Generation conference. Association for Computational Linguistics, 2016. pp. 265-273
@inproceedings{a252c69b0a7f415380f360377468fdcc,
title = "Crowd-sourcing NLG Data: Pictures Elicit Better Data",
abstract = "Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances. In this work, we propose a novel framework for crowd-sourcing high quality NLG training data, using automatic quality control measures and evaluating different MRs with which to elicitdata. We show that pictorial MRs result in better NL data being collected than logic-based MRs: utterances elicited by pictorial MRs are judged as significantly more natural, more informative, and better phrased, with a significant increase in average quality ratings (around 0.5 points on a 6-point scale), compared to using the logical MRs. As the MR becomes more complex, the benefits of pictorial stimuli increase. The collected data will be released as part of this submission.",
author = "Jekaterina Novikova and Oliver Lemon and Verena Rieser",
year = "2016",
language = "English",
pages = "265--273",
booktitle = "Proceedings of the 9th International Natural Language Generation conference",
publisher = "Association for Computational Linguistics",

}

Novikova, J, Lemon, O & Rieser, V 2016, Crowd-sourcing NLG Data: Pictures Elicit Better Data. in Proceedings of the 9th International Natural Language Generation conference. Association for Computational Linguistics, pp. 265-273, 9th International Natural Language Generation Conference, Edinburgh, United Kingdom, 5/09/16.

Crowd-sourcing NLG Data: Pictures Elicit Better Data. / Novikova, Jekaterina; Lemon, Oliver; Rieser, Verena.

Proceedings of the 9th International Natural Language Generation conference. Association for Computational Linguistics, 2016. p. 265-273.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Crowd-sourcing NLG Data: Pictures Elicit Better Data

AU - Novikova, Jekaterina

AU - Lemon, Oliver

AU - Rieser, Verena

PY - 2016

Y1 - 2016

N2 - Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances. In this work, we propose a novel framework for crowd-sourcing high quality NLG training data, using automatic quality control measures and evaluating different MRs with which to elicitdata. We show that pictorial MRs result in better NL data being collected than logic-based MRs: utterances elicited by pictorial MRs are judged as significantly more natural, more informative, and better phrased, with a significant increase in average quality ratings (around 0.5 points on a 6-point scale), compared to using the logical MRs. As the MR becomes more complex, the benefits of pictorial stimuli increase. The collected data will be released as part of this submission.

AB - Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances. In this work, we propose a novel framework for crowd-sourcing high quality NLG training data, using automatic quality control measures and evaluating different MRs with which to elicitdata. We show that pictorial MRs result in better NL data being collected than logic-based MRs: utterances elicited by pictorial MRs are judged as significantly more natural, more informative, and better phrased, with a significant increase in average quality ratings (around 0.5 points on a 6-point scale), compared to using the logical MRs. As the MR becomes more complex, the benefits of pictorial stimuli increase. The collected data will be released as part of this submission.

M3 - Conference contribution

SP - 265

EP - 273

BT - Proceedings of the 9th International Natural Language Generation conference

PB - Association for Computational Linguistics

ER -

Novikova J, Lemon O, Rieser V. Crowd-sourcing NLG Data: Pictures Elicit Better Data. In Proceedings of the 9th International Natural Language Generation conference. Association for Computational Linguistics. 2016. p. 265-273