The E2E Dataset: New Challenges For End-to-End Generation

Jekaterina Novikova, Ondrej Dusek, Verena Rieser

Research output: Chapter in Book/Report/Conference proceedingConference contribution

204 Citations (Scopus)
84 Downloads (Pure)


This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data.
Original languageEnglish
Title of host publicationProceedings of the SIGDIAL 2017 Conference
Place of PublicationStroudsburg, PA, USA
PublisherAssociation for Computational Linguistics
Number of pages6
ISBN (Electronic)978-1-945626-82-1
Publication statusPublished - 16 Aug 2017
Event18th Annual Meeting of the Special Interest Group on Discourse and Dialogue - Universität des Saarlandes, Saarbrücken, Germany
Duration: 15 Aug 201717 Aug 2017


Conference18th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Abbreviated titleSIGDIAL 2017
Internet address


Dive into the research topics of 'The E2E Dataset: New Challenges For End-to-End Generation'. Together they form a unique fingerprint.

Cite this