This dataset comprises Likert scale human ratings for texts produced by three recent data-driven natural language generation (NLG) systems over three different datasets, as provided to us by the systems’ authors. We collected 3 or more ratings per for informativeness, naturalness, and overall quality of the NLG-produced text, given the source meaning representation. The ratings were used to evaluate current automatic metrics for NLG and motivate the development of new, improved ones.
|Date made available||2017|
|Date of data production||2017|
Novikova, J. (Creator), Dusek, O. (Creator), Cercas Curry, A. (Creator), Rieser, V. (Creator) (2017). Human Ratings of Natural Language Generation Outputs. Heriot-Watt University. NLG_human_ratings(.zip).