Your browser is not supported.
The latest version of Safari, Chrome, Firefox, Internet Explorer or Microsoft Edge is required to use this website.
Click the button below to update and we look forward to seeing you soon.Update now
Conversational AI is fast becoming integral in delivering excellent customer service over the phone. Voice assistants powered by conversational AI are now able to communicate with customers in the same way your agents would, giving your customers access to 24/7 support.
Once you’re ready to get started with conversational AI for voice, you’ll find that there are a number of different ways to design, build and deploy a voice assistant. Some of these methods seem cost-effective initially but quickly become prohibitively expensive. Others tend to hit roadblocks that prevent deployment or cause users to flee.
In this post, we’ve identified the four main approaches to deploying voice assistants and highlighted the differences between them to help you in your journey to deploying a successful voice assistant.
Some companies opt to build their own DIY voice assistant in-house, from scratch.
Bank of America did just that with their AI-driven virtual financial assistant: Erica. This completely self-owned system was created by a development team of 100 people in 2017. It took the team nearly 2 years to build, and cost an estimated $30 million. One Bank of America employee commented on the development of Erica, saying that the bank “learned [that] there are over 2,000 different ways to ask us to move money.”
However, not all companies have access to the resources and capabilities needed to pursue a DIY approach to conversational AI. DIY would seldom be the right strategy for innovative companies, as their competitive strengths lie elsewhere.
Conversational AI technology will become more modularized in the future, with separate innovations made in speech recognition, natural language processing (NLP), natural language generation (NLG), and speech synthesis. An active ecosystem of companies, from startups to tech giants, is already innovating in these spaces. It would be difficult for any company to sustain innovation in this sector.
We can easily imagine in retrospect a company resolving to build their own computer in the 1970s and the likely consequence of such a decision. Building your own voice-based conversational AI solution would likely follow the same fate today.
Many enterprises are experimenting with DIY conversational platforms offered by Google Dialogflow, Amazon Lex, IBM Watson, or open-source frameworks like Rasa. These platforms take a graphical user interface approach to building conversational assistants and are favored by those who wish to keep development in-house.
However, few enterprises have put their platform-built voice assistants live with real customers. Their proofs-of-concept remain under lock and key months after the initial build, only interacting with testers through pre-scripted queries as designed by product teams.
An initial proof-of-concept that handles 3 – 5 broad intents should be ready to face real customers within 2 weeks and be ready to expand to additional intents easily within 6 weeks of deployment, after reaching a threshold level of intent accuracy and customer satisfaction.
These conversational platforms are not purpose-built for voice interactions. While household names like Google and Amazon have good ASR and NLU capabilities, a large part of building a working voice assistant involves orchestrating the performance of these component pieces to suit the use case. Even though these platforms aim to give the client more control, they lock the client into using a single supplier for every piece of the tech stack, preventing the client from choosing the best in class technology in each area.
Scalability is often a challenge for virtual assistants built with DIY platforms. Virtual assistants built with GUI interfaces alone tend to resemble decision trees, offering little flexibility in how conversations can proceed. Exceptions and edge cases must be handcrafted, resulting in a complex web of dependencies that can quickly fall apart at the smallest tweak.
DIY platforms are often offered at a reasonable cost as a part of a larger cloud package, However, the expertise required to build and maintain solutions, whether in-house or on a consulting basis, is both difficult to find and expensive. Many such projects run longer than expected. The cost of ownership is often not factored into the ROI on such projects, which is something that every enterprise should consider.
Third-party platforms do have a place in creating simple, logic-based chatbots, but they are insufficient for use-cases more complex than FAQs.
Many companies begin their conversational AI journey with a chatbot. It is a reasonable place to start; chatbots can be deployed on established channels, and they can help to deflect call volume. Many companies have had some success with chatbots as an alternative channel of customer service in the last decade.
However, converting a chatbot into a voicebot yields mixed results.
The most basic way to convert a chatbot into a voicebot is to add speech recognition to the voice input and text-to-speech for the output. The result will likely frustrate customers who rarely speak as precisely or concisely as they type.
Chat-based solutions look for keywords, but in speech, we often tell stories in full sentences and paragraphs. The noise in the voice channel is also high, whether it be literal background noise, or filler words (umms and ahhhs), accents, slang or turns of phrase.
Speech recognition involves reliably and accurately transcribing spoken utterances into text that your bot can process. Out-of-the-box speech recognition solutions from big cloud providers are about as good as they’re going to get, but they’re not perfect. These solutions require fine-tuning to neutralize accents and listen out for contextual inputs, so if a caller says ‘4’, the bot knows to listen out for a number and can disregard possible non-numeric transcriptions like ‘fur’, ‘for’, ‘fork’ etc.
But just like humans, it’s impossible for a machine to ‘hear’ spoken inputs with 100% accuracy. Like humans, machines need to apply knowledge and context to what they ‘hear’ in order to fully understand.
Partnering with a vendor who specializes in voice, using speech technology specifically developed for spoken interactions, is the best way to reduce risk when deploying a voice assistant for your business.
Great CX is about flexibility. It’s about empowering customers to drive conversations the way they want. Customers should be able to express themselves in their own words, interrupt, ask questions and change their minds at any point in the conversation.
Human-level understanding is achieved through close collaboration across all layers of the conversational AI stack from speech recognition to dialogue management.
At PolyAI, we augment speech recognition to reduce transcription errors. We fine-tune our NLU model to increase accuracy in critical moments – those habitual pauses, mumbles, and clarifications – to make a conversation flow. We optimize dialogue management to account for context throughout the entire conversation, so your customers are always understood.
PolyAI’s voice assistants are pre-trained in common consumer intents such as ID&V, payments, bookings, and more. Powered by our ConveRT model (the most accurate understanding model on the market), our voice assistants can understand any customer intent out-of-the-box.
Thanks to our pre-trained model, we don’t require training data from clients. A couple of hours talking through common call flows is usually enough to design and build a custom voice assistant for your company, ready to deploy with real customers within just 2 weeks.
Working with a specialized voice AI automation vendor yields cost savings in a number of ways:
Because conversational AI is new and expertise varies, enterprises generally find it difficult to gauge and budget upfront investment for a DIY voice assistant project meant for deployment.
PolyAI has a per-minute pricing model that scales with your contact volume, at a significantly lower operating expense per call than traditional call centers. AI also provides flexibility to solve daily peaks and troughs and seasonal effects without the need for additional staffing
Voice assistants automate after-call work, eliminating wrap-up from handling time.
Unlike IVR, which offers only partial automation, PolyAI automates interactions end-to-end with up to 90% first call resolution. Where IVRs may reduce call handling times, PolyAI voice assistants are able to take a significant number of calls away from live agents, allowing them to provide fast responses to customer queries that require empathy and complex reasoning.
Don’t wait to implement your AI voice assistant. Talk to us about how PolyAI can help your business launch new customer experiences at scale, improving loyalty and retention, reducing call center costs and proving ROI within months.
In an initial meeting with you, we might discuss: