Abstract
We address the problem of interactively learning perceptually grounded word meanings in a multimodal dialogue system. We design a semantic and visual processing system to support this and illustrate how they can be integrated. We then focus on comparing the performance (Precision, Recall, F1, AUC) of three state-of-the-art attribute classifiers for the purpose of interactive language grounding (MLKNN, DAP, and SVMs), on the aPascal-aYahoo datasets. In prior work, results were presented for object classification using these methods for attribute labelling, whereas we focus on their performance for attribute labelling itself. We find that while these methods can perform well for some of the attributes (e.g. head, ears, furry) none of these models has good performance over the whole attribute set, and none supports incremental learning. This leads us to suggest directions for future work.
Original language | English |
---|---|
Title of host publication | Proceedings of the 4th Workshop on Vision and Language (VL'15) |
Pages | 60-69 |
Number of pages | 10 |
Publication status | Published - 2015 |
Event | 4th Workshop on Vision and Language - Portugal, Lisbon, Portugal Duration: 18 Sept 2015 → … |
Conference
Conference | 4th Workshop on Vision and Language |
---|---|
Abbreviated title | VL'15 |
Country/Territory | Portugal |
City | Lisbon |
Period | 18/09/15 → … |
Fingerprint
Dive into the research topics of 'Comparing attribute classifiers for interactive language grounding'. Together they form a unique fingerprint.Datasets
-
aPascal-aYahoo Image Data Collection
Yu, Y. (Creator), Department of Computer Science, University of Illinois at Urbana-Champaign, 2009
http://vision.cs.uiuc.edu/attributes/
Dataset