Abstract
Training a statistical surface realiser typically relies on labelled training data or parallel data sets, such as corpora of paraphrases. The procedure for obtaining such data for new domains is not only time-consuming, but it also restricts the incorporation of new semantic slots during an interaction, i.e. using an online learning scenario for automatically extended domains. Here, we present an alternative approach to statistical surface realisation from unlabelled data through automatic semantic slot labelling. The essence of our algorithm is to cluster clauses based on a similarity function that combines lexical and semantic information. Annotations need to be reliable enough to be utilised within a spoken dialogue system. We compare different similarity functions and evaluate our surface realiser - trained from unlabelled data - in a human rating study. Results confirm that a surface realiser trained from automatic slot labels can lead to outputs of comparable quality to outputs trained from human-labelled inputs.
Original language | English |
---|---|
Title of host publication | 2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings |
Publisher | IEEE |
Pages | 112-117 |
Number of pages | 6 |
ISBN (Print) | 9781479971299 |
DOIs | |
Publication status | Published - 2014 |
Event | 6th IEEE Workshop on Spoken Language Technology 2014 - South Lake Tahoe, United States Duration: 7 Dec 2014 → 10 Dec 2014 |
Conference
Conference | 6th IEEE Workshop on Spoken Language Technology 2014 |
---|---|
Abbreviated title | SLT 2014 |
Country/Territory | United States |
City | South Lake Tahoe |
Period | 7/12/14 → 10/12/14 |
Keywords
- Dialogue systems
- Semantic slot labelling
- Surface realisation
- Unsupervised and supervised learning
ASJC Scopus subject areas
- Computer Science Applications
- Human-Computer Interaction
- Computer Vision and Pattern Recognition
- Artificial Intelligence
- Language and Linguistics