Join us at PolyAI VOX 2023
⚠️ Unsupported Browser

Your browser is not supported.

The latest version of Safari, Chrome, Firefox, Internet Explorer or Microsoft Edge is required to use this website.

Click the button below to update and we look forward to seeing you soon.

Update now

Data Collection and End-to-End Learning for Conversational AI

EMNLP 2019 Tutorial

3rd November, 2019.

Hong Kong, China.


A fundamental long-term goal of conversational AI is to merge two main dialogue system paradigms into a standalone multi-purpose system. Such a system should be capable of conversing about arbitrary topics (Paradigm 1: open-domain dialogue systems), and simultaneously assist humans with completing a wide range of tasks with well-defined semantics such as restaurant search and booking, customer service applications, or ticket bookings (Paradigm 2: task-based dialogue systems).

The recent developmental leaps in conversational AI technology are undoubtedly linked to more and more sophisticated deep learning algorithms that capture patterns in increasing amounts of data generated by various data collection mechanisms. The goal of this tutorial is therefore twofold. First, it aims at familiarising the research community with the recent advances in algorithmic design of statistical dialogue systems for both open-domain and task-based dialogue paradigms. The focus of the tutorial is on recently introduced end-to-end learning for dialogue systems and their relation to more common modular systems. In theory, learning end-to-end from data offers seamless and unprecedented portability of dialogue systems to a wide spectrum of tasks and languages. From a practical point of view, there are still plenty of research challenges and opportunities remaining: in this tutorial we analyse this gap between theory and practice, and introduce the research community with the main advantages as well as with key practical limitations of current end-to-end dialogue learning.

The critical requirement of each statistical dialogue system is the data at hand. The system cannot provide assistance for the task without having appropriate task-related data to learn from. Therefore, the second major goal of this tutorial is to provide a comprehensive overview of the current approaches to data collection for dialogue, and analyse the current gaps and challenges with diverse data collection protocols, as well as their relation to and current limitations of data-driven end-to-end dialogue modeling. We will again analyse this relation and limitations both from research and industry perspective, and provide key insights on the application of state-of-the-art methodology into industry-scale conversational AI systems.



Paweł Budzianowski . Machine Learning Engineer . PhD candidate @ Cambridge . EMNLP’18 Best Paper . ICASSP’18 Best Paper

Iñigo Casanueva . Machine Learning Scientist at Poly AI . PostDoc @ Cambridge . 10+ dialogue publications . PhD from Sheffield

Ivan Vulic  . Senior Scientist at PolyAI . Senior PostDoc @ Cambridge . 50+ top NLP publications . PhD from KU Leuven


EMNLP Tutorial 2019 slides