Abstract
People understand and produce language incrementally on a word by word basis. This gives rise to many characteristic conversational phenomena including long mid-sentence pauses that are followed by incremental clarification requests (iCRs) intended to recover the rest of the truncated turn (see Fig. 1; (A), (B), (C)). The ability to generate iCRs is important in natural conversational AI systems, and crucial to their accessibility to users with memory impairment. In this paper, we collect, release and analyse sluice-cr: a large corpus of 3000 human produced iCRs. We then use this corpus to probe the incremental processing capability of a number of state of the art LLMs by evaluating the quality of the model's generated iCRs in response to incomplete questions. Our evaluations show that the ability to generate contextually appropriate iCRs only emerges at larger LLM sizes, and only when prompted with example iCRs from our corpus. They also indicate that autoregressive LMs are, in principle, able to both understand and generate language incrementally.
Original language | English |
---|---|
Title of host publication | Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation 2024 |
Editors | Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue |
Publisher | European Language Resources Association |
Pages | 3242-3249 |
Number of pages | 8 |
ISBN (Print) | 9782493814104 |
Publication status | Published - 20 May 2024 |
Event | Joint International Conference on Computational Linguistics, Language Resources and Evaluation 2024 - Lingotto Conference Centre, Torino, Italy Duration: 20 May 2024 → 25 May 2024 https://lrec-coling-2024.org/ |
Conference
Conference | Joint International Conference on Computational Linguistics, Language Resources and Evaluation 2024 |
---|---|
Abbreviated title | LREC-COLING 2024 |
Country/Territory | Italy |
City | Torino |
Period | 20/05/24 → 25/05/24 |
Internet address |
Keywords
- clarification
- conversational AI
- corpus
- dialogue
- evaluation
- incremental
- LLM
ASJC Scopus subject areas
- Theoretical Computer Science
- Computational Theory and Mathematics
- Computer Science Applications