Abstract
In a language generation system, a content planner selects which elements must be included in the output text and the ordering between them. Recent empirical approaches perform content selection without any ordering and have thus no means to ensure that the output is coherent. In this paper we focus on the problem of generating text from a database and present a trainable end-to-end generation system that includes both content selection and ordering. Content plans are represented intuitively by a set of grammar rules that operate on the document level and are acquired automatically from training data. We develop two approaches: the first one is inspired from Rhetorical Structure Theory and represents the document as a tree of discourse relations between database records; the second one requires little linguistic sophistication and uses tree structures to represent global patterns of database record sequences within a document. Experimental evaluation on two domains yields considerable improvements over the state of the art for both approaches.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing |
| Publisher | Association for Computational Linguistics |
| Pages | 1503-1514 |
| Number of pages | 12 |
| ISBN (Electronic) | 9781937284978 |
| Publication status | Published - Oct 2013 |
| Event | 2013 Conference on Empirical Methods in Natural Language Processing - Seattle, United States Duration: 18 Oct 2013 → 21 Oct 2013 |
Conference
| Conference | 2013 Conference on Empirical Methods in Natural Language Processing |
|---|---|
| Abbreviated title | EMNLP 2013 |
| Country/Territory | United States |
| City | Seattle |
| Period | 18/10/13 → 21/10/13 |
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Information Systems
- Computer Vision and Pattern Recognition