PolyAI raises $50 million series C Read more

Why Businesses Regret Using Self-Build Solutions to Create Virtual Assistants

February 19, 2020


When it comes to building a virtual assistant, many companies will turn to bot-building platforms like Google Dialogflow, Amazon Lex and Microsoft Bot Framework. The appeal of developing virtual assistants in-house is understandable, but it’s a decision that many companies come to regret.

In this post, we’ll look at the pitfalls of bot-building platforms, highlighting the key considerations you should keep in mind if you’re thinking of developing your virtual assistant in house.


While bot-building platforms are billed as straightforward, companies will still need to create a team in order to deliver a virtual assistant successfully.

A decent bot-building team will comprise of at least one project manager, a voice user interface designer, an API developer, an implementation engineer and a tester. This team will likely cost in excess of £300,000 annually, not including tax and overheads. It’s worth noting that experienced machine learning engineers and voice designers are still pretty rare, and hiring a decent team will take time.

Once a team is in place, companies should budget 3-6 months for their team to develop a prototype. This isn’t including time for any research and training the team may need to get up to speed with the technology requirements. It’s not uncommon for these development projects to take in excess of twelve months to get to a deployment-ready state, and we have known FTSE 250 companies who have spent over a year developing solutions that ended up getting binned.


It’s easy to take voice interface design for granted – after all, we all know how to hold a conversation.

But designing for voice is much more difficult than it first appears. Unfortunately, many of these development teams don’t discover this until the solution is live in front of – and massively failing – customers.

The thing many developers fail to consider is that you don’t actually want your virtual assistant to speak in the same way that a person would. The way we communicate with virtual assistants is very different to how we communicate with people and this needs to be reflected in design.

Good voice designers will understand that the language the assistant uses will influence the responses a user gives. They will know how to use sound to alleviate users’ concerns mid-conversation, and they will understand the variety of routes users can take through conversations.

Take the following example of a straightforward example for making a restaurant reservation.

Building an assistant to follow this flow would be relatively simple. But in real life, people don’t talk like this. They change their mind and ask questions. So the flow ends up looking more like this…

And this is still a simple example. Think of all the possible questions your customers could ask, all the ways in which they could change their minds, and all the different routes they might want to take through a conversation.

A quick Google search will turn up tons of useful free resources on voice interface design, but inexperienced designers will need the time to get to grips with the discipline, and this learning needs to be factored into your development timeline and budget. Even then, voice interface design is still an extremely specialised field and for inexperienced designers, mistakes will be inevitable and it will not be until the system is deployed live and communicating with customers that these mistakes will be picked up.


It’s important to consider how your virtual assistant will slot into the desired workflow. Does it need to integrate with CRMs, booking platforms, inventory management software, delivery partners, payment providers, telefony systems etc?

Any integration will need to be fully scoped and it’s worth remembering that some will be much more complex than others. Less experienced teams will need to spend a good amount of time researching integrations, and may even find themselves needing to negotiate terms and collaborate with third party platforms to gain access to the right information.

Training and application performance

Out-of-the-box bot-builders still require a significant amount of work in terms of training recognition models. It will be your development team’s responsibility to train the model how to understand customers and how to respond to them. This means spending hours thinking up ways in which customers might phrase their queries and inputting these into the bot-builder dashboard. This gets especially complicated when you’re building an assistant to listen out for similar intents. Without a disciplined process for mapping training phrases to a particular intent, you’ll see ‘cross contamination’ where the assistant mistakes intents and provides nonsensical responses.

Speech Recognition

How well your virtual assistant is able to recognise and transcribe words and phrases is down to the ASR technology you use. If you’ve tried talking with a virtual assistant before, you’ll understand just how unreliable ASR can be.

Using a bot-building platform means you are reliant on the ASR solution that platform uses. But in our experience, there is no one reliable ASR solution. Great results come from leveraging multiple ASR solutions and having them work together to get accurate results.

It also makes sense to bias your system to behave differently depending on expected input. For example, ASR may have difficulty differentiating ‘8’ from ‘hate’ from ‘ate’ from ‘h’. But if the assistant just asked a customer for their phone number, you can bias your system to select a numerical value from all the inputs the ASR has recorded as a possibility. This type of tweaking is not possible when using a bot-building platform, but it is crucial for enterprise organisations.

Continuous improvements

As with any technology project, you will learn an awful lot about the requirements after the conversational agent has been deployed. Hindsight is a beautiful thing.

Bot-building platforms use machine learning to improve the virtual assistant’s performance based on observed behaviour, but continuous improvement on these platforms doesn’t come for free – it requires even much more labeling effort from your developers because the machine learning models live on these platforms are not the best in class – they are simply too data hungry.

For example, when you start adding new functionality to your assistant, you’ll see that the system becomes incredibly complex incredibly quickly. Adding a new step, or ‘node’, to your design will carry consequences for connecting nodes, and the more the system is able to understand and interact, the more time-consuming and expensive it will become to make changes.

Conversational AI as a Service

Building a virtual assistant now is a bit like building a website fifteen years ago.

These days, anyone can build a website with a platform like Squarespace, and even if you wanted to develop your site from scratch in-house, you could hire a good team at a good price. But not that long ago, building a website required development skills that only a limited number of people had.

Bot-builders like Lex and Dialogflow will one day be as simple to use as Squarespace, but for now a huge degree of expertise and experience is required to use them, and even then the results are rarely acceptable for commercial use.

Over the next ten years, the same sort of technological advances that made website development simple, will happen in bot-building. For now, companies who want results should make use of managed conversational AI services, where independent teams of experts use internally developed neural network models to build virtual assistants that directly meet your business requirements.

Ready to hear it for yourself?

Get a personalized demo to learn how PolyAI can help you
 drive measurable business value.

Request a demo

Request a demo