PolyAI raises $50 million series C Read more

How voicebots handle languages & accents

January 7, 2021


Let’s face it. Understanding your customers isn’t always easy. They tell stories, go off track and they often can’t think of a good way to explain an issue to you. And those are just the customers that speak the same language as you.

Even for humans, it can be really difficult to understand people with strong or unfamiliar accents. Call center workers receive accent training to understand 35 different variations of English alone.

And what about communicating in different languages? For multinational organisations, speaking to customers in different languages means going the extra mile to hire multilingual staff, and making sure that calls actually go through to the agent who can speak the caller’s language.


The same phrase spoken in different languages

Organisations in English-speaking countries have an inherent advantage. English was very likely the first language used on the internet, which means more training data which leads to more accurate predictions by software. All this means that machines can ‘understand’ better in English.

But even in English, accents remain a challenge. Speech recognition has advanced leaps and bounds but it’s not perfect. The exact same conversation with a New Yorker versus an Australian versus a Serbian speaker of US English will all leave a slightly different text transcript, which is what trips up most voice assistants today. This leads to the much dreaded “sorry, can you repeat?” loop for even native English speakers.

For organisations in non-English-speaking countries or those that serve multicultural customers, the challenge is two-fold. Firstly, speech recognition solutions are not as accurate in other languages, immediately making things more difficult. Secondly, less training data in a particular language acts as a barrier to better accuracy of conversational models in that language. This is why it is much harder to build great conversational experiences for customers in say, Italian, Latvian, or Singlish than it is in English.

But it’s not hopeless. From day one, PolyAI has had a focus on creating voice self-service experiences in any language, regardless of accents or slang. It’s our goal to deliver enterprise-ready voice assistants that match and excel household names – and our products achieve just that. We are purpose-built for multilingual voice applications, which means we do a few things differently…

Speech recognition optimization

Some speech recognition solutions are better at understanding particular languages and accents than others. For example, the best solution for Polish may be different to the best solution for Thai.

Flexibility here is key as speech recognition solutions continue to improve, which is why we test different speech recognition providers to find the best one for each project. On top of this, we use machine learning to add an additional layer of optimisation to each stage of a conversation. This ensures the most accurate transcriptions regardless of call quality or accent.

Think of this as the technology equivalent of accent training, applied phrase by phrase.

Pre-trained speech encoder in multiple languages

Our state-of-the-art machine learning model is pre-trained on over a billion English conversations, which gives us a world-class foundation for natural language understanding.

We’re constantly building upon this foundation by pre-training our model in over 15 other European and non-European languages. Instead of building new models for different languages, we incorporate new languages on-demand in a matter of weeks. This enables us to achieve the fastest time to market for voice assistants that are truly capable of conversing with real customers in multiple languages.


List of languages that PolyAI's voice assistants are pre-trained in so far
PolyAI voice assistants are pre-trained in multiple languages

A multilingual value extractor

Our proprietary value extractor, ConVEx (see here), is also trained in multiple languages.

This means that PolyAI voice assistants are able to accurately take down valuable information, such as names and addresses, in any language.

Our research has shown that our multilingual approach outperforms monolingual models. For example, by teaching our model German on top of its English foundation, our voice assistants are better at identifying information given by callers compared with a model trained just on English as well as a model trained just on German. This mirrors similar findings by other leading tech companies like Facebook.

All this is to say, our voice assistants are better at collecting information from callers in different languages, making them uniquely suited for self-service experiences.

The future of multilingual customer service

Whether you’re an organisation trying to boost self-service for English-speaking customers or non-English-speaking customers, we see proof-of-concepts falling flat with real customers due to common issues all relating back to these three elements:

  1. The ability for speech recognition to deal with accents and imperfect signals;
  2. Robustness of speech encoders in each language; and
  3. Accuracy of value extraction when dealing with natural and informal ways of speaking.

These barriers are not insurmountable; the technology is available today to build multilingual self-service experiences, but it does require an extra level of craftsmanship. Look for companies like PolyAI who have both the proprietary technology as well as the research expertise to help you create new self-service experiences for your customers in any language.

Ready to hear it for yourself?

Get a personalized demo to learn how PolyAI can help you
 drive measurable business value.

Request a demo

Request a demo