Abstract
The majority of NLG evaluation relies on automatic metrics, such as BLEU. In this paper, we motivate the need for novel, system- and data-independent automatic evaluation methods: We investigate a wide range of metrics, including state-of-the-art word-based and novel grammar-based ones, and demonstrate that they only weakly reflect human judgements of system outputs as generated by data-driven, end-to-end NLG. We also show that metric performance is data- and system-specific. Nevertheless, our results also suggest that automatic metrics perform reliably at system-level and can support system development by finding cases where a system performs poorly.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing |
Publisher | Association for Computational Linguistics |
Pages | 2231-2242 |
Number of pages | 12 |
ISBN (Electronic) | 978-1-945626-83-8 |
DOIs | |
Publication status | Published - 10 Sept 2017 |
Event | 2017 Conference on Empirical Methods in Natural Language Processing - Øksnehallen, Copenhagen, Denmark Duration: 9 Sept 2017 → 11 Sept 2017 |
Conference
Conference | 2017 Conference on Empirical Methods in Natural Language Processing |
---|---|
Abbreviated title | EMNLP 2017 |
Country/Territory | Denmark |
City | Copenhagen |
Period | 9/09/17 → 11/09/17 |
Keywords
- natural language generation
- natural language processing
- evaluation
- evaluation metrics
Fingerprint
Dive into the research topics of 'Why We Need New Evaluation Metrics for NLG'. Together they form a unique fingerprint.Datasets
-
Human Ratings of Natural Language Generation Outputs
Novikova, J. (Creator), Dusek, O. (Creator), Cercas Curry, A. (Creator) & Rieser, V. (Creator), Heriot-Watt University, 2017
https://github.com/jeknov/EMNLP_17_submission
Dataset
Profiles
-
Verena Rieser
- School of Mathematical & Computer Sciences - Professor
- School of Mathematical & Computer Sciences, Computer Science - Professor
Person: Academic (Research & Teaching)