Your browser is not supported.
The latest version of Safari, Chrome, Firefox, Internet Explorer or Microsoft Edge is required to use this website.
Click the button below to update and we look forward to seeing you soon.Update now
At PolyAI, our conversational agents are powered, in part, by machine learning models that detect the intent behind what a user says. For example, in a banking environment, if a customer says “When did you send me my new card?”, the models will detect that you’re enquiring about card arrivals and the agent will route you through the conversation accordingly.
Our intent models are powered by our state-of-the-art sentence encoder ConveRT model, which has been trained on billions of real-world sentences in order to capture the “meaning” of what the user says. In software, this “meaning” is encoded as a point in 1024-dimensional space. Given some training data, our intent classifiers learn to associate these sequences with user intents, e.g. whether the user is asking about card arrival, erroneous charges on their statement, exchange rates, etc.
For most applications, we’ve been using neural networks to classify sentence “meanings” into particular intent categories. In this post, we’ll explore a way to do intent classification with a k-nearest neighbour (KNN) classifier.
KNN is seen as a “classic” machine learning algorithm, but in contrast to modern neural networks, it offers explainability (an insight into how the classifier makes its decision), as well as the ability to enable/disable certain intents depending on the context of conversion, among other benefits. This usually comes at the expense of classification accuracy, but we’ll talk more about how to remedy this and how we harnessed the benefits of KNN later in this post.
Let’s visualise the 1024-dimensional “meaning” vectors, called sentence embeddings, in 2D. We can think of a neural network classifier as predicting an intent for a sentence using learned decision boundaries. However, a k-nearest neighbours (KNN) classifier works by searching for k closest known (from the training data) embeddings to what the user said, and predicting an intent that a majority of neighbours have.
It would be relatively straightforward to apply a KNN classifier now, but the embeddings we get from our encoder model are not particularly suitable for it. In the diagram below, the encoder embeddings are on the left hand side: we see that intent clusters can be of different shapes, densities and distances from each other. For KNN to succeed, we’d like to have neat and tidy clusters that are well separated from each other. We can accomplish this by training another neural network that transforms embeddings into what we’ll call “geometrically-friendly” embeddings.
Learning such a clustering of embeddings is explored in the branch of machine learning called metric learning (see this paper on the topic). Since we’re training a neural network, we need a suitable loss function.
Intuitively, we would like our loss function to pull transformed points closer together if they represent the same intent (positive pairs), or push them further apart if they are of different classes/intents (negative pairs), but only if the distance between them is smaller than a given “margin”. This can be achieved with a triplet or contrastive loss, but here we use the lifted structure loss. The lifted structure loss makes use of all examples in a training batch to create positive and negative pairs, allowing the training to converge faster.
Now we have everything to assemble the new intent classification system. In production, a user’s sentence would be first encoded by the encoder, then transformed by our new learned transformation function, and then finally classified by the KNN:
Here are some of the results of this setup, some of which we would not be able to achieve with a neural network-based intent classifier:
Overall, this is a good example of how using a different approach to classification, combined with a “classic” machine learning algorithm, can give more freedom in what we can do with our agents.
Edgar Liberis is a PhD student at the University of Oxford. He joined PolyAI for a remote machine learning internship over Summer 2020. As well as the work discussed in this blog post, Edgar contributed to a number of projects including zero-shot learning, out of domain point detection and assembling large text classification datasets.