Humanlike AI agents must master three core skills: listening, reasoning, and speaking Read more

Building trust and easing customer effort with PolyAI and NVIDIA Riva

June 11, 2025

Share

PolyAI technical blog image

Accurate listening is the foundation for PolyAI’s lifelike AI agents and their ability to build trust with callers in customer service situations. Effective communication begins with listening – and this capability needs to be robust enough to account for factors like background noise, quality of the phone signal, as well as accents, dialects and other traits specific to each person’s way or cadence of speaking.

With PolyAI Owl, our in-house speech recognition model fine-tuned on NVIDIA Riva, PolyAI voice AI agents are more accurate in understanding what’s being said – and what’s meant – across the millions of enterprise customer service conversations that we handle each month.

“The combination of NVIDIA Riva and PolyAI’s know-how around spoken language understanding allows our AI agents to reduce customer frustration and build trust during the most critical customer conversations for the enterprises we work with,” said Shawn Wen, CTO and Co-Founder of PolyAI. “Lower word error rates, more contextual awareness, better control over latency and interruptions are all benefits we’ve seen since deploying PolyAI Owl fine-tuned on NVIDIA Riva.”

Building trust in conversations with lower word error rates

With NVIDIA Riva, PolyAI has access to cutting-edge foundational models trained on vast amounts of speech data, which we enhance through fine-tuning on our proprietary telephony and conversational datasets, including synthetically generated speech and weakly-labeled conversational audio. This approach enables us to achieve lower word error rates for our voice AI Agents using PolyAI Owl with full model customization for different accents, noisy environments, and poor audio quality typical of phone calls, ensuring optimal performance for each client’s unique use case.

How better language understanding can reduce effort and boost CSAT

For Pacific Gas & Electric, a leading utility company in the Fortune 500, better understanding of spoken language over the phone enabled PolyAI voice AI agents to achieve a 25% reduction in customer effort with more efficient conversations and a 22% increase in CSAT for outage calls versus their previous solution.

Reducing frustration by optimizing latency and interruptions

NVIDIA Riva’s inference speed is crucial for ensuring PolyAI Owl processes audio at a speed that matches the natural pace of conversations over the phone. We achieve high inference performance based on tight coupling with NVIDIA Riva Inference Server, minimizing latency when every millisecond counts. This ensures our AI agents make customers feel confident that they’ve been correctly understood, enabling us to handle the most complex and critical conversations along the customer journey.

Another important aspect of reducing frustration over the phone is knowing when a customer has finished their thought so that the AI Agent can identify the most relevant action and business process to help resolve their issue.

NVIDIA Riva’s customizable Voice Activity Detection algorithms provide tunable parameters that are key to the way our AI Agents handle barge-in, interruptions and natural turn-taking during noisy phone calls.

The best way to handle interruptions varies for different brands and different use cases, it depends completely on what is important to the business and to the customer at each moment in conversation – so control over VAD and different parameters provide PolyAI with the flexibility to adapt the experience to what is important for each brand.

 

Ready to hear it for yourself?

Get a personalized demo to learn how PolyAI can help you
 drive measurable business value.

Request a demo

Request a demo