Crowd-sourcing NLG Data: Pictures Elicit Better Data

Jekaterina Novikova, Oliver Lemon, Verena Rieser

Research output: Chapter in Book/Report/Conference proceedingConference contribution

40 Citations (Scopus)


Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances. In this work, we propose a novel framework for crowd-sourcing high quality NLG training data, using automatic quality control measures and evaluating different MRs with which to elicit
data. We show that pictorial MRs result in better NL data being collected than logic-based MRs: utterances elicited by pictorial MRs are judged as significantly more natural, more informative, and better phrased, with a significant increase in average quality ratings (around 0.5 points on a 6-point scale), compared to using the logical MRs. As the MR becomes more complex, the benefits of pictorial stimuli increase. The collected data will be released as part of this submission.
Original languageEnglish
Title of host publicationProceedings of the 9th International Natural Language Generation conference
PublisherAssociation for Computational Linguistics
Publication statusPublished - 2016
Event9th International Natural Language Generation Conference - University of Edinburgh building at 50 George Square , Edinburgh, United Kingdom
Duration: 5 Sept 20168 Sept 2016


Conference9th International Natural Language Generation Conference
Abbreviated titleINLG 2016
Country/TerritoryUnited Kingdom
Internet address


Dive into the research topics of 'Crowd-sourcing NLG Data: Pictures Elicit Better Data'. Together they form a unique fingerprint.

Cite this