Non-cooperative dialogue capabilities have been identified as important in a variety of application areas, including education, military operations, video games, police investigation and healthcare. In prior work, it was shown how agents can learn to use explicit manipulation moves in dialogue (e.g. "I really need wheat") to manipulate adversaries in a simple trading game. The adversaries had a very simple opponent model. In this paper we implement a more complex opponent model for adversaries, we now model all trading dialogue moves as affecting the adversary's opponent model, and we work in a more complex game setting: Catan. Here we show that (even in such a non-stationary environment) agents can learn to be legitimately persuasive ("the good") or deceitful ("the bad"). We achieve up to 11% higher success rates than a reasonable hand-crafted trading dialogue strategy ("the ugly"). We also present a novel way of encoding the state space for Reinforcement Learning of trading dialogues that reduces the state-space size to 0.5% of the original, and so reduces training times dramatically.
|Title of host publication||Proceedings of the 19th Workshop on the Semantics and Pragmatics of Dialogue|
|Editors||Christine Howes, Staffan Larsson|
|Number of pages||9|
|Publication status||Published - 24 Aug 2015|
|Event||19th Workshop on the Semantics and Pragmatics of Dialogue - Gothenburg, Sweden|
Duration: 24 Aug 2015 → 26 Aug 2015
|Conference||19th Workshop on the Semantics and Pragmatics of Dialogue|
|Abbreviated title||SEMDIAL 2015 - goDIAL|
|Period||24/08/15 → 26/08/15|
Efstathiou, I., & Lemon, O. (2015). Learning non-cooperative dialogue policies to beat opponent models: "The good, the bad and the ugly''. In C. Howes, & S. Larsson (Eds.), Proceedings of the 19th Workshop on the Semantics and Pragmatics of Dialogue (pp. 33-41). (Proceedings (SemDial)).