Abstract
Natural, spontaneous dialogue proceeds incrementally on a word-by-word basis; and it contains many sorts of disfluency such as mid-utterance/sentence hesitations, interruptions, and self-corrections. But training data for machine learning approaches to dialogue processing is often either cleaned-up or wholly synthetic in order to avoid such phenomena. The question then arises of how well systems trained on such clean data generalise to real spontaneous dialogue, or indeed whether they are trainable at all on naturally occurring dialogue data. To answer this question, we created a new corpus called bAbI+ by systematically adding natural spontaneous incremental dialogue phenomena such as restarts and self-corrections to the Facebook AI Research’s bAbI dialogues dataset. We then explore the performance of a state-of-the-art retrieval model, MemN2N (Bordes et al., 2017; Sukhbaatar et al., 2015), on this more natural dataset. Results show that the semantic accuracy of the MemN2N model drops drastically; and that although it is in principle able to learn to process the constructions in bAbI+, it needs an impractical amount of training data to do so. Finally, we go on to show that an incremental, semantic parser – DyLan – shows 100% semantic accuracy on both bAbI and bAbI+, highlighting the generalisation properties of linguistically informed dialogue models.
Original language | English |
---|---|
Title of host publication | Proceedings of the 21st Workshop on the Semantics and Pragmatics of Dialogue (SemDial 2017 - SaarDial) |
Editors | Volha Petukhova, Ye Tian |
Pages | 125-133 |
Publication status | Published - Aug 2017 |
Event | 21st Workshop on the Semantics and Pragmatics of Dialogue - Saarbrücken, Germany Duration: 15 Aug 2017 → 17 Aug 2017 Conference number: 21 http://www.saardial.uni-saarland.de/?page_id=2 |
Workshop
Workshop | 21st Workshop on the Semantics and Pragmatics of Dialogue |
---|---|
Abbreviated title | Semdial 2017 - Saardial |
Country/Territory | Germany |
City | Saarbrücken |
Period | 15/08/17 → 17/08/17 |
Internet address |