Abstract
Work on training semantic slot labellers for use in Natural Language Processing applications has typically either relied on large amounts of labelled input data, or has assumed entirely unlabelled inputs. The former technique tends to be costly to apply, while the latter is often not as accurate as its supervised counterpart. Here, we present a semi-supervised learning approach that automatically labels the semantic slots in a set of training data and aims to strike a balance between the dependence on labelled data and prediction accuracy. The essence of our algorithm is to cluster clauses based on a similarity function that combines lexical and semantic information. We present experiments that compare different similarity functions for both our semi-supervised setting and a fully unsupervised baseline. While semi-supervised learning expectedly outperforms unsupervised learning, our results show that (1) this effect can be observed based on very few training data instances and that increasing the size of the training data does not lead to better performance, and (2) that lexical and semantic information contribute differently in different domains so that clustering based on both types of information offers the best generalisation.
Original language | English |
---|---|
Title of host publication | Proceedings - 2014 13th International Conference on Machine Learning and Applications, ICMLA 2014 |
Publisher | IEEE |
Pages | 500-505 |
Number of pages | 6 |
ISBN (Print) | 9781479974153 |
DOIs | |
Publication status | Published - 2014 |
Event | 2014 13th International Conference on Machine Learning and Applications - Detroit, United Kingdom Duration: 3 Dec 2014 → 6 Dec 2014 |
Conference
Conference | 2014 13th International Conference on Machine Learning and Applications |
---|---|
Abbreviated title | ICMLA 2014 |
Country/Territory | United Kingdom |
City | Detroit |
Period | 3/12/14 → 6/12/14 |
Keywords
- interactive systems
- natural language processing
- semantic slot labelling
- semi-supervised learning