Abstract
This paper proposes a data-driven method for concept-to-text generation, the task of automatically producing textual output from non-linguistic input. A key insight in our approach is to reduce the tasks of content selection ("what to say") and surface realization ("how to say") into a common parsing problem. We define a probabilistic context-free grammar that describes the structure of the input (a corpus of database records and text describing some of them) and represent it compactly as a weighted hypergraph. The hypergraph structure encodes exponentially many derivations, which we rerank discriminatively using local and global features. We propose a novel decoding algorithm for finding the best scoring derivation and generating in this setting. Experimental evaluation on the ATIS domain shows that our model outperforms a competitive discriminative system both using BLEU and in a judgment elicitation study.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |
| Publisher | Association for Computational Linguistics |
| Pages | 369-378 |
| Number of pages | 10 |
| ISBN (Print) | 9781937284244 |
| Publication status | Published - Jul 2012 |
| Event | 50th Annual Meeting of the Association for Computational Linguistics 2012 - Jeju Island, Korea, Republic of Duration: 8 Jul 2012 → 14 Jul 2012 |
Conference
| Conference | 50th Annual Meeting of the Association for Computational Linguistics 2012 |
|---|---|
| Abbreviated title | ACL 2012 |
| Country/Territory | Korea, Republic of |
| City | Jeju Island |
| Period | 8/07/12 → 14/07/12 |
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Software