The E2E Dataset: New Challenges For End-to-End Generation

Jekaterina Novikova, Ondrej Dusek, Verena Rieser

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Downloads (Pure)

Abstract

This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data.
Original languageEnglish
Title of host publicationProceedings of the SIGDIAL 2017 Conference
Place of PublicationStroudsburg, PA, USA
PublisherAssociation for Computational Linguistics
Pages201-206
Number of pages6
ISBN (Electronic)978-1-945626-82-1
Publication statusPublished - 16 Aug 2017
Event18th Annual Meeting of the Special Interest Group on Discourse and Dialogue - Universität des Saarlandes, Saarbrücken, Germany
Duration: 15 Aug 201717 Aug 2017
http://www.sigdial.org/workshops/conference18/

Conference

Conference18th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Abbreviated titleSIGDIAL 2017
CountryGermany
CitySaarbrücken
Period15/08/1717/08/17
Internet address

Datasets

The E2E Challenge Dataset

Novikova, J. (Creator), Dusek, O. (Creator) & Rieser, V. (Creator), Heriot-Watt University, Nov 2017

Dataset

Cite this

Novikova, J., Dusek, O., & Rieser, V. (2017). The E2E Dataset: New Challenges For End-to-End Generation. In Proceedings of the SIGDIAL 2017 Conference (pp. 201-206). Association for Computational Linguistics. http://www.sigdial.org/workshops/conference18/proceedings/pdf/SIGDIAL25.pdf