The E2E Challenge Dataset

  • Jekaterina Novikova (Creator)
  • Ondrej Dusek (Creator)
  • Verena Rieser (Creator)



The E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area (>5k distinct meaning representations with >50k corresponding natural language reference texts).

The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection.
Date made availableNov 2017
PublisherHeriot-Watt University
Date of data production2017

Research Output

  • 1 Conference contribution

The E2E Dataset: New Challenges For End-to-End Generation

Novikova, J., Dusek, O. & Rieser, V., 16 Aug 2017, Proceedings of the SIGDIAL 2017 Conference. Stroudsburg, PA, USA: Association for Computational Linguistics, p. 201-206 6 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Open Access

Cite this

Novikova, J. (Creator), Dusek, O. (Creator), Rieser, V. (Creator) (Nov 2017). The E2E Challenge Dataset. Heriot-Watt University. e2e_dataset(.zip). 10.17861/c07fe36b-d0fa-4bbe-967c-f8b4341a133f