Podcast - Episode 81

Raven v2 and the Race to Smarter Voice Agents with Matt Henderson

About the show

Hosted by Nikola Mrkšić, Co-founder and CEO of PolyAI, the Deep Learning with PolyAI podcast is the window into AI for CX leaders. We cut through hype in customer experience, support, and contact center AI — helping decision-makers understand what really matters.

Never miss an episode

Summary

In this episode, your host Nikola Mrkšić sits down with Matt Henderson, VP of Research at PolyAI, to unpack the state of large language models, their quirks, and what really matters when building smarter voice agents for customer service.

Join us for a discussion on:

  • Why GPT-5 may feel more incremental than game-changing
  • Why reasoning models can fail at surprisingly simple tasks
  • How PolyAI's Raven outperforms generalist LLMs in latency-sensitive, real-world CX use cases
  • The balance between speed, accuracy, and reasoning for live customer interactions
  • What open-source models, quantization, and fine-tuning mean for enterprise AI strategies

👉 Learn more about Raven and conversational AI here: https://poly.ai/blog/polyai-raven-v2-large-language-model/

Key Takeaways

  • GPT-5 is incremental for voice: While GPT-5 was hyped, PolyAI’s own testing found it underwhelming for live voice tasks — sometimes less accurate than GPT-5 Mini. This highlights why PolyAI invests in purpose-built models for CX.
  • Raven v2 sets the bar: PolyAI’s in-house Raven v2 outperformed GPT-5 in latency-sensitive benchmarks, proving that specialized design beats general models when reliability and speed matter for enterprise voice AI.
  • Reasoning vs. real-time: GPT-5 emphasizes reasoning, but long deliberations don’t work in phone conversations. PolyAI is innovating with latency-aware reasoning — models that think just enough without keeping customers waiting.
  • From demos to deployment: Open-source releases and coding benchmarks grab headlines, but PolyAI focuses on enterprise-grade reliability — turning cutting-edge research into systems that work at scale in real contact centers.

Transcript

Nikola Mrkšić
00:05 – 00:45
Hi, everyone. Welcome to another episode of the PolyAI podcast.
Today with me is our VP of Research, Matt Henderson. My PhD followed Matt’s at the Dialog Systems Group at Cambridge, and we’ve been working together for over a decade now.
I’m really excited to speak to you on this one. And, I think, like, you know, the first thing that I’d love to just maybe have you frame to the audience because I always love the way you explain these things.
How do you feel about g p t five? And, like, you know, now that it’s cooled off, I think I was very enthusiastic, when we spoke about it with Sean last week, but I’d love to maybe, start with your take of what changed post that announcement.

Matt Henderson
00:45 – 00:53
Yeah. Thanks, Nikola.
And, yeah. Welcome to my first podcast appearance on any podcast.
Big moment for, for.

Nikola Mrkšić
00:53 – 00:56
Remember the first one now? Okay.

Matt Henderson
00:56 – 00:58
yeah. Yeah.
Yeah, GPT.

Nikola Mrkšić
00:58 – 01:01
you’ll be the next host, and you can do the rest of them.

Matt Henderson
01:01 – 02:02
five. I learned from the best.
Yeah. GPT five was interesting, I think.
Clearly, they spent a lot of time reluctant to bump the version number up to five, and you can see that on their OpenAI models page there’s, you know, four, four zero four one four point five zero one zero two. It’s like it’s a mess.
But now they I think you can see they wanted to have a clean, you know, here’s five, this is what you get, this is everything, market it as one model. There’s not really one model.
It’s a router routing to different models. So the people on chat g p t, got a mixed experience, and I think what they say there was a bug in the routing model, which meant that you were getting a less clever non reasoning version sometimes, and so it was failing on the sort of easy prompts.