Learning non-cooperative dialogue policies to beat opponent models: "The good, the bad and the ugly''

Ioannis Efstathiou, Oliver Lemon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Non-cooperative dialogue capabilities have been identified as important in a variety of application areas, including education, military operations, video games, police investigation and healthcare. In prior work, it was shown how agents can learn to use explicit manipulation moves in dialogue (e.g. "I really need wheat") to manipulate adversaries in a simple trading game. The adversaries had a very simple opponent model. In this paper we implement a more complex opponent model for adversaries, we now model all trading dialogue moves as affecting the adversary's opponent model, and we work in a more complex game setting: Catan. Here we show that (even in such a non-stationary environment) agents can learn to be legitimately persuasive ("the good") or deceitful ("the bad"). We achieve up to 11% higher success rates than a reasonable hand-crafted trading dialogue strategy ("the ugly"). We also present a novel way of encoding the state space for Reinforcement Learning of trading dialogues that reduces the state-space size to 0.5% of the original, and so reduces training times dramatically.
Original languageEnglish
Title of host publicationProceedings of the 19th Workshop on the Semantics and Pragmatics of Dialogue
EditorsChristine Howes, Staffan Larsson
Pages33-41
Number of pages9
Publication statusPublished - 24 Aug 2015
Event19th Workshop on the Semantics and Pragmatics of Dialogue - Gothenburg, Sweden
Duration: 24 Aug 201526 Aug 2015

Publication series

NameProceedings (SemDial)
ISSN (Print)2308-2275

Conference

Conference19th Workshop on the Semantics and Pragmatics of Dialogue
Abbreviated titleSEMDIAL 2015 - goDIAL
Country/TerritorySweden
CityGothenburg
Period24/08/1526/08/15

Fingerprint

Dive into the research topics of 'Learning non-cooperative dialogue policies to beat opponent models: "The good, the bad and the ugly'''. Together they form a unique fingerprint.

Cite this