TY - GEN
T1 - Learning non-cooperative dialogue policies to beat opponent models: "The good, the bad and the ugly''
AU - Efstathiou, Ioannis
AU - Lemon, Oliver
PY - 2015/8/24
Y1 - 2015/8/24
N2 - Non-cooperative dialogue capabilities have been identified as important in a variety of application areas, including education, military operations, video games, police investigation and healthcare. In prior work, it was shown how agents can learn to use explicit manipulation moves in dialogue (e.g. "I really need wheat") to manipulate adversaries in a simple trading game. The adversaries had a very simple opponent model. In this paper we implement a more complex opponent model for adversaries, we now model all trading dialogue moves as affecting the adversary's opponent model, and we work in a more complex game setting: Catan. Here we show that (even in such a non-stationary environment) agents can learn to be legitimately persuasive ("the good") or deceitful ("the bad"). We achieve up to 11% higher success rates than a reasonable hand-crafted trading dialogue strategy ("the ugly"). We also present a novel way of encoding the state space for Reinforcement Learning of trading dialogues that reduces the state-space size to 0.5% of the original, and so reduces training times dramatically.
AB - Non-cooperative dialogue capabilities have been identified as important in a variety of application areas, including education, military operations, video games, police investigation and healthcare. In prior work, it was shown how agents can learn to use explicit manipulation moves in dialogue (e.g. "I really need wheat") to manipulate adversaries in a simple trading game. The adversaries had a very simple opponent model. In this paper we implement a more complex opponent model for adversaries, we now model all trading dialogue moves as affecting the adversary's opponent model, and we work in a more complex game setting: Catan. Here we show that (even in such a non-stationary environment) agents can learn to be legitimately persuasive ("the good") or deceitful ("the bad"). We achieve up to 11% higher success rates than a reasonable hand-crafted trading dialogue strategy ("the ugly"). We also present a novel way of encoding the state space for Reinforcement Learning of trading dialogues that reduces the state-space size to 0.5% of the original, and so reduces training times dramatically.
UR - https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwir86zl24zPAhUqLMAKHf22CNkQFgggMAE&url=http%3A%2F%2Fflov.gu.se%2FdigitalAssets%2F1537%2F1537599_semdial2015_godial_proceedings.pdf&usg=AFQjCNEKtnHmOfjQwCOSmqtjgT976GPqKw
M3 - Conference contribution
T3 - Proceedings (SemDial)
SP - 33
EP - 41
BT - Proceedings of the 19th Workshop on the Semantics and Pragmatics of Dialogue
A2 - Howes, Christine
A2 - Larsson, Staffan
T2 - 19th Workshop on the Semantics and Pragmatics of Dialogue
Y2 - 24 August 2015 through 26 August 2015
ER -