Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking)

Ondrej Dusek, Karin Sevegnani, Ioannis Konstas, Verena Rieser

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a recurrent neural network based system for automatic quality estimation of natural language generation (NLG) outputs, which jointly learns to assign numerical ratings to individual outputs and to provide pairwise rankings of two different outputs. The latter is trained using pairwise hinge loss over scores from two copies of the rating network. We use learning to rank and synthetic data to improve the quality of ratings assigned by our system: We synthesise training pairs of distorted system outputs and train the system to rank the less distorted one higher. This leads to a 12% increase in correlation with human ratings over the previous benchmark. We also establish the state of the art on the dataset of relative rankings from the E2E NLG Challenge (Dusek et al., 2019), where synthetic data lead to a 4% accuracy increase over the base model.
Original languageEnglish
Title of host publicationProceedings of the 12th International Conference on Natural Language Generation
PublisherAssociation for Computational Linguistics
Pages369–376
Number of pages8
ISBN (Electronic)9781950737949
DOIs
Publication statusPublished - 2019
Event12th International Conference on Natural Language Generation 2019 - Tokyo, Japan
Duration: 28 Oct 20191 Nov 2019

Conference

Conference12th International Conference on Natural Language Generation 2019
Country/TerritoryJapan
CityTokyo
Period28/10/191/11/19

Fingerprint

Dive into the research topics of 'Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking)'. Together they form a unique fingerprint.

Cite this