Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback

Ioannis Papaioannou, Amanda Cercas Curry, Jose Part, Igor Shalyminov, Xu Xinnuo, Yanchao Yu, Ondrej Dusek, Verena Rieser, Oliver Lemon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe our Alexa prize system (called ‘Alana’) which consists of an ensemble of bots, combining rule-based and machine learning systems, and using a contextual ranking mechanism to choose system responses. This paper reports on the version of the system developed and evaluated in the semi-finals of the competition (i.e. up to 15 August 2017), but not on subsequent enhancements. The ranker for this system was trained on real user feedback received during the competition, where we address the problem of how to train on the noisy and sparse feedback obtained during the competition. In order to avoid initial problems of inappropriate and boring utterances coming from big datasets such as Reddit and Twitter, we later focussed on ‘clean’ data sources such as news and facts. We report on experiments with different ranking functions and versions of our NewsBot. We find that a multiturn news strategy is beneficial, and that a ranker trained on the ratings feedback from users is also effective. Our system continuously improved using the data gathered over the course over the competition (1 July – 15 August) . Our final user score (averaged user rating over the whole semi-finals period) was 3.12, and we achieved 3.3 for the averaged user rating over the last week of the semi-finals (8-15 August 2017). We were also able to achieve long dialogues (average 10.7 turns) during the competition period. In subsequent weeks, after the end of the semi-final competition, we have achieved our highest scores of 3.52 (daily average, 18th October), 3.45 (weekly average on 23 and 24 October), and average dialogue lengths of 14.6 turns (1 October), and median dialogue length of 2.25 minutes (average for 7 days on 10th October).
Original languageEnglish
Title of host publication2017 Alexa Prize Proceedings
Publication statusPublished - 2017

Fingerprint

Feedback
Learning systems
Boring
Experiments

Cite this

Papaioannou, I., Cercas Curry, A., Part, J., Shalyminov, I., Xinnuo, X., Yu, Y., ... Lemon, O. (2017). Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback. In 2017 Alexa Prize Proceedings
Papaioannou, Ioannis ; Cercas Curry, Amanda ; Part, Jose ; Shalyminov, Igor ; Xinnuo, Xu ; Yu, Yanchao ; Dusek, Ondrej ; Rieser, Verena ; Lemon, Oliver. / Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback. 2017 Alexa Prize Proceedings. 2017.
@inproceedings{9c1691bbe1034ca9a56929b16afa3798,
title = "Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback",
abstract = "We describe our Alexa prize system (called ‘Alana’) which consists of an ensemble of bots, combining rule-based and machine learning systems, and using a contextual ranking mechanism to choose system responses. This paper reports on the version of the system developed and evaluated in the semi-finals of the competition (i.e. up to 15 August 2017), but not on subsequent enhancements. The ranker for this system was trained on real user feedback received during the competition, where we address the problem of how to train on the noisy and sparse feedback obtained during the competition. In order to avoid initial problems of inappropriate and boring utterances coming from big datasets such as Reddit and Twitter, we later focussed on ‘clean’ data sources such as news and facts. We report on experiments with different ranking functions and versions of our NewsBot. We find that a multiturn news strategy is beneficial, and that a ranker trained on the ratings feedback from users is also effective. Our system continuously improved using the data gathered over the course over the competition (1 July – 15 August) . Our final user score (averaged user rating over the whole semi-finals period) was 3.12, and we achieved 3.3 for the averaged user rating over the last week of the semi-finals (8-15 August 2017). We were also able to achieve long dialogues (average 10.7 turns) during the competition period. In subsequent weeks, after the end of the semi-final competition, we have achieved our highest scores of 3.52 (daily average, 18th October), 3.45 (weekly average on 23 and 24 October), and average dialogue lengths of 14.6 turns (1 October), and median dialogue length of 2.25 minutes (average for 7 days on 10th October).",
author = "Ioannis Papaioannou and {Cercas Curry}, Amanda and Jose Part and Igor Shalyminov and Xu Xinnuo and Yanchao Yu and Ondrej Dusek and Verena Rieser and Oliver Lemon",
year = "2017",
language = "English",
booktitle = "2017 Alexa Prize Proceedings",

}

Papaioannou, I, Cercas Curry, A, Part, J, Shalyminov, I, Xinnuo, X, Yu, Y, Dusek, O, Rieser, V & Lemon, O 2017, Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback. in 2017 Alexa Prize Proceedings.

Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback. / Papaioannou, Ioannis; Cercas Curry, Amanda; Part, Jose; Shalyminov, Igor; Xinnuo, Xu; Yu, Yanchao; Dusek, Ondrej; Rieser, Verena; Lemon, Oliver.

2017 Alexa Prize Proceedings. 2017.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback

AU - Papaioannou, Ioannis

AU - Cercas Curry, Amanda

AU - Part, Jose

AU - Shalyminov, Igor

AU - Xinnuo, Xu

AU - Yu, Yanchao

AU - Dusek, Ondrej

AU - Rieser, Verena

AU - Lemon, Oliver

PY - 2017

Y1 - 2017

N2 - We describe our Alexa prize system (called ‘Alana’) which consists of an ensemble of bots, combining rule-based and machine learning systems, and using a contextual ranking mechanism to choose system responses. This paper reports on the version of the system developed and evaluated in the semi-finals of the competition (i.e. up to 15 August 2017), but not on subsequent enhancements. The ranker for this system was trained on real user feedback received during the competition, where we address the problem of how to train on the noisy and sparse feedback obtained during the competition. In order to avoid initial problems of inappropriate and boring utterances coming from big datasets such as Reddit and Twitter, we later focussed on ‘clean’ data sources such as news and facts. We report on experiments with different ranking functions and versions of our NewsBot. We find that a multiturn news strategy is beneficial, and that a ranker trained on the ratings feedback from users is also effective. Our system continuously improved using the data gathered over the course over the competition (1 July – 15 August) . Our final user score (averaged user rating over the whole semi-finals period) was 3.12, and we achieved 3.3 for the averaged user rating over the last week of the semi-finals (8-15 August 2017). We were also able to achieve long dialogues (average 10.7 turns) during the competition period. In subsequent weeks, after the end of the semi-final competition, we have achieved our highest scores of 3.52 (daily average, 18th October), 3.45 (weekly average on 23 and 24 October), and average dialogue lengths of 14.6 turns (1 October), and median dialogue length of 2.25 minutes (average for 7 days on 10th October).

AB - We describe our Alexa prize system (called ‘Alana’) which consists of an ensemble of bots, combining rule-based and machine learning systems, and using a contextual ranking mechanism to choose system responses. This paper reports on the version of the system developed and evaluated in the semi-finals of the competition (i.e. up to 15 August 2017), but not on subsequent enhancements. The ranker for this system was trained on real user feedback received during the competition, where we address the problem of how to train on the noisy and sparse feedback obtained during the competition. In order to avoid initial problems of inappropriate and boring utterances coming from big datasets such as Reddit and Twitter, we later focussed on ‘clean’ data sources such as news and facts. We report on experiments with different ranking functions and versions of our NewsBot. We find that a multiturn news strategy is beneficial, and that a ranker trained on the ratings feedback from users is also effective. Our system continuously improved using the data gathered over the course over the competition (1 July – 15 August) . Our final user score (averaged user rating over the whole semi-finals period) was 3.12, and we achieved 3.3 for the averaged user rating over the last week of the semi-finals (8-15 August 2017). We were also able to achieve long dialogues (average 10.7 turns) during the competition period. In subsequent weeks, after the end of the semi-final competition, we have achieved our highest scores of 3.52 (daily average, 18th October), 3.45 (weekly average on 23 and 24 October), and average dialogue lengths of 14.6 turns (1 October), and median dialogue length of 2.25 minutes (average for 7 days on 10th October).

M3 - Conference contribution

BT - 2017 Alexa Prize Proceedings

ER -

Papaioannou I, Cercas Curry A, Part J, Shalyminov I, Xinnuo X, Yu Y et al. Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback. In 2017 Alexa Prize Proceedings. 2017