Your browser is not supported.
The latest version of Safari, Chrome, Firefox, Internet Explorer or Microsoft Edge is required to use this website.
Click the button below to update and we look forward to seeing you soon.
Update nowAt PolyAI, our conversational agents are powered, in part, by machine learning models that detect the intent behind what a user says. For example, in a banking environment, if a customer says “When did you send me my new card?”, the models will detect that you’re enquiring about card arrivals and the agent will route you through the conversation accordingly.
Our intent models are powered by our state-of-the-art sentence encoder ConveRT model, which has been trained on billions of real-world sentences in order to capture the “meaning” of what the user says. In software, this “meaning” is encoded as a point in 1024-dimensional space. Given some training data, our intent classifiers learn to associate these sequences with user intents, e.g. whether the user is asking about card arrival, erroneous charges on their statement, exchange rates, etc.
An example of intent classification. An encoded point gets mapped to the best matching intent class.
For most applications, we’ve been using neural networks to classify sentence “meanings” into particular intent categories. In this post, we’ll explore a way to do intent classification with a k-nearest neighbour (KNN) classifier.
KNN is seen as a “classic” machine learning algorithm, but in contrast to modern neural networks, it offers explainability (an insight into how the classifier makes its decision), as well as the ability to enable/disable certain intents depending on the context of conversion, among other benefits. This usually comes at the expense of classification accuracy, but we’ll talk more about how to remedy this and how we harnessed the benefits of KNN later in this post.
Let’s visualise the 1024-dimensional “meaning” vectors, called sentence embeddings, in 2D. We can think of a neural network classifier as predicting an intent for a sentence using learned decision boundaries. However, a k-nearest neighbours (KNN) classifier works by searching for k closest known (from the training data) embeddings to what the user said, and predicting an intent that a majority of neighbours have.
A visualisation of KNN and neural network classification. Green and red could represent “AFFIRM” (e.g. user says “Yes”, “Agree”) and “DENY” (e.g. user says “No”, “I wouldn’t like that”) intent classes, respectively. In both cases, the query point (in white) will be predicted to be green = “AFFIRM”.
It would be relatively straightforward to apply a KNN classifier now, but the embeddings we get from our encoder model are not particularly suitable for it. In the diagram below, the encoder embeddings are on the left hand side: we see that intent clusters can be of different shapes, densities and distances from each other. For KNN to succeed, we’d like to have neat and tidy clusters that are well separated from each other. We can accomplish this by training another neural network that transforms embeddings into what we’ll call “geometrically-friendly” embeddings.
A visualisation of encoder embeddings (left) and embeddings we would like to have for KNN classification (right). Points from the same intent should be close together, but points from different interns should be a certain distance (margin) apart.
Learning such a clustering of embeddings is explored in the branch of machine learning called metric learning (see this paper on the topic). Since we’re training a neural network, we need a suitable loss function.
Intuitively, we would like our loss function to pull transformed points closer together if they represent the same intent (positive pairs), or push them further apart if they are of different classes/intents (negative pairs), but only if the distance between them is smaller than a given “margin”. This can be achieved with a triplet or contrastive loss, but here we use the lifted structure loss. The lifted structure loss makes use of all examples in a training batch to create positive and negative pairs, allowing the training to converge faster.
Now we have everything to assemble the new intent classification system. In production, a user’s sentence would be first encoded by the encoder, then transformed by our new learned transformation function, and then finally classified by the KNN:
Here are some of the results of this setup, some of which we would not be able to achieve with a neural network-based intent classifier:
“Geometric” classifier accuracy on our test datasets, versus KNN and neural network (NN) intent classifiers. The numbers in brackets indicate whether the dataset was downsampled to a number of examples per intent.
Overall, this is a good example of how using a different approach to classification, combined with a “classic” machine learning algorithm, can give more freedom in what we can do with our agents.
Edgar Liberis is a PhD student at the University of Oxford. He joined PolyAI for a remote machine learning internship over Summer 2020. As well as the work discussed in this blog post, Edgar contributed to a number of projects including zero-shot learning, out of domain point detection and assembling large text classification datasets.