The State of Conversational AI Today

There’s been so much hype around conversational AI that – unless you’re a researcher in the field – it’s difficult to know what’s possible and what’s not. From sloppy Facebook Messenger chatbots to fully sentient robot friends – the spectrum of AI in popular culture and our everyday lives is broad.

The shift to conversation-driven interaction between humans and computers is well underway. Many people are now using smart speakers to request actions or information, and while AI-infused chatbots are discussed a lot in contact centres, there haven’t been many successful commercial deployments yet. The experience of the chatbots that are available is usually disappointing, leaving customers frustrated because their expectations have not been met.

The reality is that ubiquitous, all-powerful AI assistants are still very much science fiction. Conversational AI has come a long way in recent years, but it’s still a nascent technology, with a ton of limitations and challenges.

Right now, we have a good understanding of the building blocks of conversational AI, but many of the technologies in the field are not yet remotely good enough for real-world applications.

In this post, we’re going to look at the current state of conversational AI to demonstrate what’s possible right now, and what’s feasible in the near future.

Chatbots suck

If you’ve tried interacting with a chatbot, chances are you’ve had a pretty bad experience. That’s not to say that building useful chatbots isn’t possible; rather, its evidence that conversational AI has a ton of nuances and complexities that need to be considered.

Contemporary chatbots typically rely on decision-tree style conversational flows – think if-this-then-that style scripts. This principle is really useful for allowing chatbot developers to engineer and script conversations, but as soon as users start to express themselves in their natural language, these chatbots simply can’t understand.

The very idea of conversations as decision-trees is entirely unnatural. We pick up language subconsciously, and the way we put sentences together is completely internalised and unique to every person. Hand-crafting chatbots using decision trees means developers have to manually program the bots to understand every way that each particular intent could possibly be expressed (and that’s not to mention misspellings or misrecognition of words by voice recognition systems).

Before we look at the smarter alternative to chatbots, let’s look at another example of ‘ubiquitous conversational AI’ available on the market today.

Alexa, are you even listening?

At the moment, smart speakers like Alexa and Google Home, and on-device virtual assistants like Siri and Google Assistant are little more than automated typists. Whether you’re asking Alexa to play Taylor Swift, or checking the weather with Siri, you’re simply replacing a few clicks and keystrokes with voice commands.

That’s not to undermine the importance of the technology that makes these commands possible – voice recognition, intent classification (what does the user want?) and value extraction (what details do we need from the user?) are all complicated processes that are still being tackled by commercial and academic dialogue teams around the world. But this type of interaction is hardly a replacement for human conversation.

One of the biggest issues with virtual assistants is their inability to remember details and re-use them later in the conversation. The best example of this is when a user changes their mind about something. If you can ask a virtual assistant where the nearest Chinese restaurant is, but then change your mind and say, ‘actually what about Indian’, it might not understand that you’re referring to an Indian restaurant.

It is possible to program virtual assistants to handle instances where users change their mind. If you ask one to set an alarm, and then you ask it to change it, it will probably do as you ask. In this case, it’s likely that the team have hand crafted the technology to deal with a value change, but only in the context of an alarm.

The future of reliable dialogue systems

There’s a wealth of research happening around deep learning and reinforcement learning whereby machines autonomously train themselves based on the available data, or via trial and error respectively.

Both of these processes hold vast potential, but are nowhere near reliable enough to scale commercially. While autonomous IVAs make sense in a research capacity, companies simply can’t take the risk of putting unstable assistants in front of customers (think about Microsoft’s infamous Tay Bot, for example).

For now, AI will work alongside human support agents to intelligently route queries, simultaneously manage large numbers of third-party applications (think CRMs, booking platforms, payments systems, etc.), and provide responses to less complex queries, handing off to human agents when needed.

These hybrid AI solutions built on top of existing machine learning frameworks provide end users a robust service and give companies peace of mind while the underlying machine learning framework enables continuous improvement and scaling potential.

While all-knowing AI assistants may be a way off yet, it is possible to automate elements of customer support to improve operational efficiency and provide unbeatable customer experience.

At PolyAI, we continue to train our model (which has already processed billions of conversations to learn to understand massive variance in natural language) to understand and carry context throughout conversations. We can then train the model further with small amounts of context-rich, domain-specific examples to deploy reliable customer service solutions quickly.

The State of Conversational AI Today

Chatbots suck

Alexa, are you even listening?

The future of reliable dialogue systems

Read more

Ready to hear it for yourself?