TY - JOUR
T1 - Does this list contain what you were searching for? Learning adaptive dialogue strategies for interactive question answering
AU - Rieser, V.
AU - Lemon, O.
PY - 2009
Y1 - 2009
N2 - Policy learning is an active topic in dialogue systems research, but it has not been explored in relation to interactive question answering (IQA). We take a first step in learning adaptive interaction policies for question answering : we address the question of how to acquire enough reliable query constraints, how many database results to present to the user and when to present them, given the competing trade-offs between the length of the answer list, the length of the interaction, the type of database and the noise in the communication channel. The operating conditions are reflected in an objective function which we use to derive a hand-coded threshold-based policy and rewards to train a reinforcement learning policy. The same objective function is used for evaluation. We show that we can learn strategies for this complex trade-off problem which perform significantly better than a variety of hand-coded policies, for a wide range of noise conditions, user types, types of DB and turn-penalties. Our policy learning framework thus covers a wide spectrum of operating conditions. The learned policies produce an average relative increase in reward of 86.78% over the hand-coded policies. In 93% of the cases the learned policies perform significantly better than the hand-coded ones (p <.001). Furthermore we show that the type of database has a significant effect on learning and we give qualitative descriptions of the learned IQA policies.
AB - Policy learning is an active topic in dialogue systems research, but it has not been explored in relation to interactive question answering (IQA). We take a first step in learning adaptive interaction policies for question answering : we address the question of how to acquire enough reliable query constraints, how many database results to present to the user and when to present them, given the competing trade-offs between the length of the answer list, the length of the interaction, the type of database and the noise in the communication channel. The operating conditions are reflected in an objective function which we use to derive a hand-coded threshold-based policy and rewards to train a reinforcement learning policy. The same objective function is used for evaluation. We show that we can learn strategies for this complex trade-off problem which perform significantly better than a variety of hand-coded policies, for a wide range of noise conditions, user types, types of DB and turn-penalties. Our policy learning framework thus covers a wide spectrum of operating conditions. The learned policies produce an average relative increase in reward of 86.78% over the hand-coded policies. In 93% of the cases the learned policies perform significantly better than the hand-coded ones (p <.001). Furthermore we show that the type of database has a significant effect on learning and we give qualitative descriptions of the learned IQA policies.
U2 - 10.1017/S1351324908004907
DO - 10.1017/S1351324908004907
M3 - Article
SN - 1351-3249
VL - 15
SP - 55
EP - 72
JO - Natural Language Engineering
JF - Natural Language Engineering
IS - 1
ER -