Join us at PolyAI VOX 2023
⚠️ Unsupported Browser

Your browser is not supported.

The latest version of Safari, Chrome, Firefox, Internet Explorer or Microsoft Edge is required to use this website.

Click the button below to update and we look forward to seeing you soon.

Update now

How to Deploy Conversational AI for Voice

Image of Kylie Whitehead
Kylie Whitehead
22 Jul 2021 - 7 minutes read

Conversational AI is fast becoming integral in delivering excellent customer service over the phone. Voice assistants powered by conversational AI are now able to communicate with customers in the same way your agents would, giving your customers access to 24/7 support. 

Once you’re ready to get started with conversational AI for voice, you’ll find that there are a number of different ways to design, build and deploy a voice assistant. Some of these methods seem cost-effective initially but quickly become prohibitively expensive. Others tend to hit roadblocks that prevent deployment or cause users to flee.

In this post, we’ve identified the four main approaches to deploying voice assistants and highlighted the differences between them to help you in your journey to deploying a successful voice assistant.



    DIY, from scratch

    Some companies opt to build their own DIY voice assistant in-house, from scratch. 

    Bank of America’s “Erica”

    Bank of America did just that with their AI-driven virtual financial assistant: Erica. This completely self-owned system was created by a development team of 100 people in 2017. It took the team nearly 2 years to build, and cost an estimated $30 million. One Bank of America employee commented on the development of Erica, saying that the bank “learned [that] there are over 2,000 different ways to ask us to move money.”

    However, not all companies have access to the resources and capabilities needed to pursue a DIY approach to conversational AI. DIY would seldom be the right strategy for innovative companies, as their competitive strengths lie elsewhere.

    Reinventing the Wheel

    Conversational AI technology will become more modularized in the future, with separate innovations made in speech recognition, natural language processing (NLP), natural language generation (NLG), and speech synthesis. An active ecosystem of companies, from startups to tech giants, is already innovating in these spaces. It would be difficult for any company to sustain innovation in this sector. 

    We can easily imagine in retrospect a company resolving to build their own computer in the 1970s and the likely consequence of such a decision. Building your own voice-based conversational AI solution would likely follow the same fate today. 

    Pros & Cons of DIY, from scratch approach



    • Purpose-built technology 

    • Control

    • Extremely expensive

    • Likely to be displaced by specialist technologies in the future




    DIY, using a third-party platform

    Many enterprises are experimenting with DIY conversational platforms offered by Google Dialogflow, Amazon Lex, IBM Watson, or open-source frameworks like Rasa. These platforms take a graphical user interface approach to building conversational assistants and are favored by those who wish to keep development in-house. 

    However, few enterprises have put their platform-built voice assistants live with real customers. Their proofs-of-concept remain under lock and key months after the initial build, only interacting with testers through pre-scripted queries as designed by product teams.

    An initial proof-of-concept that handles 3 – 5 broad intents should be ready to face real customers within 2 weeks and be ready to expand to additional intents easily within 6 weeks of deployment, after reaching a threshold level of intent accuracy and customer satisfaction.

    Limited Capabilities

    These conversational platforms are not purpose-built for voice interactions. While household names like Google and Amazon have good ASR and NLU capabilities, a large part of building a working voice assistant involves orchestrating the performance of these component pieces to suit the use case. Even though these platforms aim to give the client more control, they lock the client into using a single supplier for every piece of the tech stack, preventing the client from choosing the best in class technology in each area.

    Limited Flexibility & Scalability 

    Scalability is often a challenge for virtual assistants built with DIY platforms. Virtual assistants built with GUI interfaces alone tend to resemble decision trees, offering little flexibility in how conversations can proceed. Exceptions and edge cases must be handcrafted, resulting in a complex web of dependencies that can quickly fall apart at the smallest tweak. 

    High Cost of Ownership

    DIY platforms are often offered at a reasonable cost as a part of a larger cloud package, However, the expertise required to build and maintain solutions, whether in-house or on a consulting basis, is both difficult to find and expensive. Many such projects run longer than expected. The cost of ownership is often not factored into the ROI on such projects, which is something that every enterprise should consider.

    Third-party platforms do have a place in creating simple, logic-based chatbots, but they are insufficient for use-cases more complex than FAQs. 

    Pros & Cons of DIY, using a third-party platform



    • Cheap to prototype

    • Available from existing cloud providers

    • Inflexible technology

    • Hard to scale

    • Rarely deployed for voice




    Port chatbot technology into voice use cases

    Many companies begin their conversational AI journey with a chatbot. It is a reasonable place to start; chatbots can be deployed on established channels, and they can help to deflect call volume. Many companies have had some success with chatbots as an alternative channel of customer service in the last decade.

    However, converting a chatbot into a voicebot yields mixed results. 

    Converting a Chatbot into a Voicebot Usually Fails

    The most basic way to convert a chatbot into a voicebot is to add speech recognition to the voice input and text-to-speech for the output. The result will likely frustrate customers who rarely speak as precisely or concisely as they type. 

    Chat-based solutions look for keywords, but in speech, we often tell stories in full sentences and paragraphs. The noise in the voice channel is also high, whether it be literal background noise, or filler words (umms and ahhhs), accents, slang or turns of phrase.

    Speech recognition involves reliably and accurately transcribing spoken utterances into text that your bot can process. Out-of-the-box speech recognition solutions from big cloud providers are about as good as they’re going to get, but they’re not perfect. These solutions require fine-tuning to neutralize accents and listen out for contextual inputs, so if a caller says ‘4’, the bot knows to listen out for a number and can disregard possible non-numeric transcriptions like ‘fur’, ‘for’, ‘fork’ etc. 

    But just like humans, it’s impossible for a machine to ‘hear’ spoken inputs with 100% accuracy. Like humans, machines need to apply knowledge and context to what they ‘hear’ in order to fully understand. 

    Pros & Cons of porting chatbot tech into voice use cases



    • You might already have a chatbot, with some of the component technologies required for a voice bot.

    • Likely to deliver sub-par experiences on the voice channel




    Work With a Specialized Voice AI Automation Provider

    Partnering with a vendor who specializes in voice, using speech technology specifically developed for spoken interactions, is the best way to reduce risk when deploying a voice assistant for your business.

    Deliver Great CX Through Human-Level Understanding

    Great CX is about flexibility. It’s about empowering customers to drive conversations the way they want. Customers should be able to express themselves in their own words, interrupt, ask questions and change their minds at any point in the conversation. 

    Human-level understanding is achieved through close collaboration across all layers of the conversational AI stack from speech recognition to dialogue management. 

    At PolyAI, we augment speech recognition to reduce transcription errors. We fine-tune our NLU model to increase accuracy in critical moments – those habitual pauses, mumbles, and clarifications – to make a conversation flow. We optimize dialogue management to account for context throughout the entire conversation, so your customers are always understood.

    Fast deployment and low risk of execution

    PolyAI’s voice assistants are pre-trained in common consumer intents such as ID&V, payments, bookings, and more. Powered by our ConveRT model (the most accurate understanding model on the market), our voice assistants can understand any customer intent out-of-the-box. 

    Thanks to our pre-trained model, we don’t require training data from clients. A couple of hours talking through common call flows is usually enough to design and build a custom voice assistant for your company, ready to deploy with real customers within just 2 weeks. 

    High ROI

    Low stakes investment

    Working with a specialized voice AI automation vendor yields cost savings in a number of ways:

    • Predictable and reasonable upfront cost

    Because conversational AI is new and expertise varies, enterprises generally find it difficult to gauge and budget upfront investment for a DIY voice assistant project meant for deployment. 

    • Predictable and reasonable operating expenses

    PolyAI has a per-minute pricing model that scales with your contact volume, at a significantly lower operating expense per call than traditional call centers. AI also provides flexibility to solve daily peaks and troughs and seasonal effects without the need for additional staffing

    • After-call work completed automatically

    Voice assistants automate after-call work, eliminating wrap-up from handling time.

    Low cost of ownership

    • Where many vendors charge obscene maintenance fees, PolyAI’s flexible dialogue architecture allows us to provide updates at a fraction of the cost.
    • PolyAI’s success-based pricing model incentivizes us to continuously optimize clients’ voice assistants for call containment. 

    Higher returns than Conversational IVRs

    Unlike IVR, which offers only partial automation, PolyAI automates interactions end-to-end with up to 90% first call resolution. Where IVRs may reduce call handling times, PolyAI voice assistants are able to take a significant number of calls away from live agents, allowing them to provide fast responses to customer queries that require empathy and complex reasoning. 


    Overview of Options for Building Voice Conversational AI


    ​​Overview of Options for Building Voice Conversational AI

    Next Steps

    Don’t wait to implement your AI voice assistant. Talk to us about how PolyAI can help your business launch new customer experiences at scale, improving loyalty and retention, reducing call center costs and proving ROI within months.

    In an initial meeting with you, we might discuss: 

    1. How voice automation fits into your customer service program
    2. How to build a successful voice bot with minimal training data 
    3. How to capture your brand’s identity in the voice channel
    4. How to build a voice bot that understands a variety of accents
    5. How to port your voice bot into all the languages spoken by your customers 

    Request a Demo >


    Conversational AI for Business | November 2020

    Why DIY conversational AI won’t achieve performance at scale

    Customer Experience | November 2020

    Why you need to invest in voice automation instead of call deflection