Human Ratings of Natural Language Generation Outputs

  • Jekaterina Novikova (Creator)
  • Ondrej Dusek (Creator)
  • Amanda Cercas Curry (Creator)
  • Verena Rieser (Creator)



This dataset comprises Likert scale human ratings for texts produced by three recent data-driven natural language generation (NLG) systems over three different datasets, as provided to us by the systems’ authors. We collected 3 or more ratings per for informativeness, naturalness, and overall quality of the NLG-produced text, given the source meaning representation. The ratings were used to evaluate current automatic metrics for NLG and motivate the development of new, improved ones.
Date made available2017
PublisherHeriot-Watt University
Date of data production2017

Cite this