Abstract
Concept-to-text generation refers to the task of automatically producing textual output from non-linguistic input. We present a joint model that captures content selection ("what to say") and surface realization ("how to say") in an unsupervised domain-independent fashion. Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free grammar that globally describes the inherent structure of the input (a corpus of database records and text describing some of them). We represent our grammar compactly as a weighted hypergraph and recast generation as the task of finding the best derivation tree for a given input. Experimental evaluation on several domains achieves competitive results with state-of-the-art systems that use domain specific constraints, explicit feature engineering or labeled data.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics |
| Subtitle of host publication | Human Language Technologies |
| Publisher | Association for Computational Linguistics |
| Pages | 752-761 |
| Number of pages | 10 |
| ISBN (Electronic) | 9781937284206 |
| Publication status | Published - Jun 2012 |
| Event | 2012 Conference of the North American Chapter of the Association for Computational Linguistics - Montreal, Canada Duration: 3 Jun 2012 → 8 Jun 2012 |
Conference
| Conference | 2012 Conference of the North American Chapter of the Association for Computational Linguistics |
|---|---|
| Abbreviated title | NAACL HLT 2012 |
| Country/Territory | Canada |
| City | Montreal |
| Period | 3/06/12 → 8/06/12 |
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science Applications
- Linguistics and Language