PolyAI raises $50 million series C Read more

Navigating the voice assistant landscape

April 26, 2023


The conversational AI marketplace is incredibly crowded, with an estimated two thousand vendors operating.

Navigating the marketplace is an ongoing challenge for companies that want to leverage these technologies to enhance customer engagement, streamline operations, and improve their bottom line.

With vendors making similar claims of cost savings, call deflection, and improved customer satisfaction, it’s difficult to know where to start.

In this blog, we’ll look at the current state of conversational AI vendors, including existing voice assistant solutions, and the factors to consider when integrating these solutions into your organization.

1. Generation 1 IVR

Interactive Voice Response systems (IVR) were a breakthrough technology in the 70s and 80s. The system allowed callers to use Dual-Tone Multi-Frequency (DTMF) or “Touch Tone” (e.g., press 1 for this, press 2 for that) to respond to prompts spoken by the IVR.

By the 90s, IVR began to support limited speech recognition and Natural Language Processing (NLP), allowing callers to use spoken keywords instead of touch-tone to respond to IVR prompts.

Why would you use Generation 1 IVR?

Generation 1 IVR systems offer a basic level of automation that enables contact centers to support customers even outside of regular business hours, with tasks such as account inquiries or appointment scheduling. These systems can help cut costs by reducing the number of agents needed to handle routine tasks.

The limitations of Generation 1 IVR

While Generation 1 IVR systems have some benefits, several limitations exist in the types of interactions they can support.

Callers are presented with a restrictive menu of options and callers must decide which option most closely matches their problem. As a result, generation 1 IVRs have a reputation for misrouting calls, with many callers purposefully hitting the wrong option in the hope of bypassing the system.

The unmistakably robotic voice of IVRs can also feel impersonal to customers. Callers that are upset or have complex queries expect a more empathetic and personable experience. Being blocked by an IVR often only increases their frustration.

Finally, without tracking and analyzing customer interactions, companies lack sufficient insight into the customer journey, such as when and why exactly customers are calling. A lack of structured data limits the ability to improve customer experience.

2. Generation 2 IVR

By the early 2000s, improving capabilities around Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) led to a wave of “Generation 2” IVR.

These systems could take advantage of several technological breakthroughs, such as the increased CPU power and standardization of speech and voice systems. They could also deploy as an integrated system to an existing call center platform.

Generation 2 IVR was the first attempt to shift the caller interface further away from touch-tone and towards a more language-driven interaction.

Why would you use Generation 2 IVR?

A more conversational approach enables callers to describe their query rather than using a phone’s keypad to navigate complex menu options. As a result, these systems create a more engaging experience for customers.

The limitations of Generation 2 IVR

Early pioneers of these systems used last-generation ASR and NLU technologies. The complexity of some customer issues means callers can use unpredictable phrasing that is difficult for these technologies to understand accurately.

These systems rely on callers to say a specific keyword to move the conversation along instead of encouraging a natural conversation. Those who aren’t native speakers of the language used by the IVR may also experience difficulties.

Deployment and maintenance of the ASR and NLU capabilities of these systems are notoriously complicated and costly. The language and speech recognition models demand significant data and training time to achieve accuracy, and the dialogue design is typically done using highly rigid, logic-based decision trees.

3. Contact Center as-a-Service (CCaaS)

The emergence of cloud-based technology brought about significant changes in the customer service industry. Legacy on-premise contact center providers faced new market entrants, and companies began adopting CCaaS solutions.

CCaaS vendors often include Generation 1 & 2 IVR technology as part of their offering. Some platforms have developed their technology stack to deliver virtual agents capable of engaging across voice and digital channels.

CCaaS adoption continues to move at an impressive pace, as does the catalog of features and services offered. Conversational AI-powered chatbots, voice assistants, insight, and analytics are commonly included.

Why would you use CCaaS?

CCaaS solutions have become increasingly popular because they offer a more flexible and scalable approach to customer service and the ability to increase capacity as call volumes grow.

Moving the contact center into the cloud makes it easier for companies to integrate other technologies and surface data that can be used to streamline operations and improve customer experience.

Today’s CCaaS solutions enable organizations to deliver personalized and efficient customer experiences through multiple channels.

The limitations of CCaaS

CCaaS solutions can suffer from diluted capabilities as they seek to support a wide range of services, from chatbots to voice bots and conversational AI features.

Compared to voice-first solutions, the capabilities of conversational solutions offered by CCaaS providers are often limited to Generation 1 & 2 IVR, which restricts a caller’s ability to speak freely and naturally.

To deliver best-in-class voice assistants, CCaaS solutions have to partner with voice-first providers, which requires additional technical resources, longer implementation times, and higher costs.

4. Conversational AI platforms

As many conversational AI technologies become increasingly commoditized, there has been an explosive growth of vendors in this space. These platforms are the conversational AI marketplace’s largest and fastest-growing voice solution groupings.

These platforms typically provide a variety of capabilities for business users to build and maintain chatbots and virtual assistants themselves. There is also a wide range of use cases and products across IT support, HR and call center automation.

Why would you use a conversational AI platform?

These platforms take a graphical user interface approach to building voice assistants and are favored by organizations that prefer to keep development in-house. The no-code environment of these platforms is designed to make it easy for non-technical teams to develop and manage chatbots or virtual assistants.

The limitations of conversational AI platforms

Conversational AI platforms are not purpose-built for voice interactions. They often provide the development framework and technical infrastructure at a high up-front cost. The expertise required to build and maintain these solutions also requires organizations to engage in costly professional services.

While many conversational AI platforms handle the underlying NLP and speech algorithms, this usually comes at the cost of control. Organizations become locked into using a single supplier for every piece of the tech stack, preventing them from choosing the best-in-class technology in each area.

5. Voice-first conversational assistants

Best-in-class voice assistants focus on the unique challenge of spoken language over the phone.

Telephony-bound voice is inherently challenging from a speech recognition perspective. Background noise, poor call quality, various dialects, and accents make understanding a caller’s words incredibly difficult.

To accurately resolve customer queries, voice-first conversational assistants must be able to do two things:

  1. Give customers the freedom to speak however they want. This means speaking in their own words, no matter how long or complicated the story, interrupting, asking questions, and diving in and out of different topics.
  2. Give customers the confidence that they can resolve their problems without needing to speak to an agent.

Best-in-class voice assistants require a unique technology stack, design methods, and implementation models. Giving callers freedom and confidence requires high-accuracy automated speech recognition and NLU optimized for spoken language.

Why would you use voice-first conversational assistants?

Where many conversational AI vendors offer diluted voice capabilities as part of a wider service offering, voice-first vendors provide a unique technology stack tuned exclusively for voice conversations to increase accuracy and account for context throughout an entire conversation.

Building, deploying, and maintaining a voice assistant requires a team of project managers, voice user interface designers, API developers, implementation designers, and testers. By working with a voice-first vendor, organizations get access to a team of specialists without hiring anyone in-house and can be confident the solution is in expert hands.

Companies can adopt a usage-based pricing model to deploy voice-first conversational assistants to control ongoing costs, allowing them to only pay for the services they use.

The limitations of voice-first conversational assistants

There is a common misconception that working with a supplier of voice assistants will give you less control over the solution. In reality, working with the right vendor will feel like a collaborative partnership, through which you extend your team’s capabilities to cover all of the many nuances of voice technology.

Working with a vendor that owns and maintains their own machine learning models will give you more control than working with platforms and large cloud providers who will not allow any changes to be made to their underlying technology.


Customer-led voice assistants empower callers to drive conversations the way they want and efficiently resolve their issues.

Partnering with a vendor specializing in voice and using a unique technology stack specifically developed for spoken interactions is the best way to reduce risk when deploying a voice assistant for your business. A specialized vendor can deploy a highly accurate voice assistant in as little as two weeks, with fast ROI.

If you care about giving your customers the freedom to speak naturally, making better operational decisions using structured conversational data, and launching customer experiences at scale, then a customer-led voice assistant is for you.

Ready to hear it for yourself?

Get a personalized demo to learn how PolyAI can help you
 drive measurable business value.

Request a demo

Request a demo