PolyAI raises 86M to transform how enterprises talk to their customers Read more

Play Podcast

Why do LLMs ramble on and on?

YouTube video thumbnail

Summary

Large language models are powerful tools — but why do they ramble on and on instead of just getting to the point?

In this episode of Deep Learning with PolyAI, Nikola Mrkšić sits down with Oliver Shoulson, one of PolyAI’s original dialogue designers, to unpack why LLMs talk the way they do, and what that means for building natural-sounding voice AI.

They dive into:

  • Why LLMs are trained to over-explain, and how that breaks spoken dialogue
  • The subtle design choices that can make conversations feel human — without trying to fool anyone
  • The debate over anthropomorphism: should AI agents admit they’re AI, or lean into human-like traits?
  • How in-house LLMs can be fine-tuned to sound less like overeager interns and more like trustworthy teammates

Voice AI doesn’t need to trick us into thinking it’s human, but it does need to feel usable. This episode explores the hidden details of dialogue design that decide whether AI earns customer trust, or just frustrates them with endless rambling.

Learn more about how to avoid LLM slop in Oliver’s blog post: https://poly.ai/blog/how-not-to-talk-like-an-llm/

Key Takeaways

  • Transparency vs. trust: Early research showed that introducing an agent as AI can trigger “representative, representative” responses, but with the right design, call outcomes and satisfaction are just as strong — sometimes better — when the system is upfront.
  • LLMs talk too much: Large language models tend toward verbose, over-explanatory answers. That works for text, but in voice-based, task-oriented dialogues, it breaks the natural flow, making interruptions necessary.
  • Why PolyAI builds its own LLMs: In-house models are fine-tuned to avoid unhelpful text habits, adapt to conversational norms (like politeness and brevity), and handle linguistic nuance across languages and dialects — something off-the-shelf LLMs don’t optimize for.
  • Design for usability, not mimicry: The goal isn’t to trick people into thinking the agent is human, but to design usable, cooperative, and trustworthy systems. PolyAI’s dialogue design ethos: help users forget they’re speaking to a machine without pretending it’s a person.

Transcript

Nikola Mrkšić
00:11 – 00:47
Hello, everyone, and welcome to another episode of deep learning with PolyAI. Today, I’ve got Oliver Shoulson, who is one of our original dialogue designers and a guy that I’ve had many, amazing conversations with over the years, and he’s taught me a lot.
And we’ll we’ll talk about some of the things that we had kinda, like, worked on, but we’re here to talk about dialogue designs, anthropomorphism, k, anthropomorphism, and, whether or not I should say it’s human or not and what putting an LLM as this kind of, like, main constituent part of its brain does to the flow of the conversation. So, Oliver, welcome to the podcast.

Oliver Shoulson
00:47 – 00:49
Thank you so much.

Nikola Mrkšić
00:49 – 01:18
It’s good to see you. And, you know, I think, like, when we discussed talking, you know, about this and this episode, we kinda, like, I think immediately went back to the kinda, like, wonderfully wonderfully formatted kinda, like, late tech paper where, there was just stuff around, like, you know, should an AI say it’s human or not, and all the research we we’ve done, beforehand on it.
So maybe we can start with just kinda, like, the broad strokes of, like, that discussion and see where it takes us.

Oliver Shoulson
01:18 – 02:01
Yeah. Sure.
I mean, this is a topic that we did a fair amount of, like, in house research on in the sort of the pre LLM days, where this question of, you know, should a bot introduce itself as a bot? Should we give it a human-like name? Should we give it a name that’s sort of not human like, like your classic like Siri or, you know, something that sort of lets you know that it’s like a special kind of entity, not a human. And we did a fair amount of research on this to see how this impacted user behavior, call outcomes, things like that.
Because, you know, I think there’s a lot of fear when you’re building and designing virtual assistants that if you tell people it’s a robot, that’s just gonna immediately trigger all of the past terrible experiences they’ve had with, like, terrible IVRs,.

Nikola Mrkšić
02:01 – 02:02
Yep. Yep.

Oliver Shoulson
02:02 – 02:06
and you’re gonna immediately say representative representative, like, you know, hand me off.

Nikola Mrkšić
02:06 – 02:08
Yep.

Oliver Shoulson
02:08 – 03:29
And we do see that that has a significant effect in terms of first turn hand off requests when you introduce the agent as a voice assistant, either calling it a virtual agent, mentioning AI, not mentioning AI. You know, when these studies were conducted, it was actually kind of right when chat GPT was originally being rolled out.
So we were kind of interested to see, like, is this news making it to people that there are suddenly now these much more capable AI agents on the market? And what we find find is that while there are effects on sort of in that first turn, on the frequency that people ask to be transferred to a human representative, that, you know, through clever design and by trying to win people’s engagement back, we actually see no statistically significant differences in call outcomes at all, in terms of, like, how what proportion of people are we able to contain within the system. Our customer satisfaction is the same, if not a little bit better when we are upfront about being a virtual assistant.
So it’s kind of this is one of those interesting design challenges where it’s like, we know that we’re going to encounter some people who, for one reason or another, are reluctant to interact with an AI. And how do we sort of demonstrate our capabilities to them and win back their engagement? And in doing so, we find.
that we can mitigate a lot of those negative outcomes.

Nikola Mrkšić
03:29 – 04:52
Yeah. Yeah.
I remember the civil wars we had internally about, you know, should you say it’s human or not? Then, you know, I think what we’re known for is just, like, like, the brand that experiences voices that you know, voice actors that we pull into a studio, work with them for days, and then clone a voice that sounds exactly the way you want it to. Like, we took that to an art form before others had.
access to better TTS engines. Right? And, this mattered.
It was at the very kind of, like, core of the product. Right? And, there were people who strongly advocated that the voice itself should be quite robotic, that otherwise, you’re entering the uncanny valley.
Then I think, you know, there was the, I think, brilliant moniker of, like, we’re not trying to, trick you into thinking it’s a human. We’re trying to help you forget that it’s a machine.
And I think that’s something that the whole team really rallied around. And then when we did that work, it just really got me just how, the one thing that mattered I remember was you can’t make a big deal out of it being AI.
Right? Because I think the historical bad experiences start with the whole I am Nikola, PolyAI’s voice agent. You can talk to me in natural sentences, and you can use normal expressions that you would in daily conversations.
If at any point you want to speak to a human, just say human. And at that point, you’re.
like, okay.

Oliver Shoulson
04:52 – 04:52
Yeah.

Nikola Mrkšić
04:52 – 05:07
That is like shooting yourself in the foot at the start of a, you know, nineteenth century duel, and you should not do that. But I think, like, you know, once we were doing, like, this testing, everything else was really a minor variation.
was almost inconsequential.

Oliver Shoulson
05:07 – 05:25
And what we also see is that once you get past that first turn, that first interaction, the effects of basically anything that you say on the first turn diminish turn to turn throughout the rest of the conversation. So.
if you can get two, three, four turns into the conversation, the way that you introduce yourself has zero effect on anything, basically, by that point.

Nikola Mrkšić
05:25 – 05:53
Yeah. Because I think at that point, like, they’ve already invested in the interaction.
So. Neither.
they believe you can help them or not. Right? Yeah.
And then okay. I think, like, the next pivotal point is the LLM.
Right? Like, people start using ChatGPT more. They realize its capabilities, but it produces a lot of text.
And, I mean, like, maybe just over to you. Like, if the system is LLM powered, especially by something like OpenAI, what happens?

Oliver Shoulson
05:53 – 07:14
Yeah. I mean, with voice agents, this is really interesting because, and this is something I went on a lot about the last time I was on this podcast.
But, like, the way in which these LLMs are sort of inherently text based, makes them really biased toward long form text output in ways that is just really unnatural for, spoken, specifically task based dialogue, where two people are trying to accomplish to go through a transaction together, to traverse a transaction together. They’re trying to be cooperative.
Those sort of long form text outputs feel really unnatural, and they have a way of diminishing the sense of actually being with another person, which is really crucial when you’re collaborating on a task. You know, one thing I like to point out is that the over-explanatory nature of LLMs and the way that they’re very verbose, like, this is a feature, not a bug.
You know? Because LLMs generate text, you know, one token at a time in this sort of linear way, they’re trained in many cases to explain what they’re doing at every step because that actually improves the reliability and consistency of their outputs. It helps connect, you know, point a of their response to point b of their response, which could be many bits of text or token later.
So they’re designed to do this, but that means that when you’re plugging one into a voice assistant, there’s a lot of work to be done to actually mitigate some of those negative, like, text oriented habits that they have.

Nikola Mrkšić
07:14 – 07:19
Yeah. Because, I mean, like, put simply, you’re used to skimming through it and ignoring it.
Right?

Oliver Shoulson
07:19 – 07:19
Yeah.

Nikola Mrkšić
07:19 – 07:25
And I think I mean, do you use voice mode on Chargebee?

Oliver Shoulson
07:25 – 07:30
A little sometimes. To be honest, like, even I get a little creeped out by.

Nikola Mrkšić
07:30 – 07:32
Really? Creeped. out of work.

Oliver Shoulson
07:32 – 07:59
I’m maybe it’s maybe it’s that there’s too much social presence there. It’s like, really, I feel like I’m being with another person, in a way that feels like I don’t know if uncanny valley is the right word because the degree of familiarity is, like, pretty high.
And, it’s like I feel like I’m policing my behavior in a way that when I’m talking to my Alexa, you know, I can be a little bit curt with her.

Nikola Mrkšić
07:59 – 08:03
you’re not really talking. You’re issuing commands and praying they work with Alexa.

Oliver Shoulson
08:03 – 08:03
Yeah.

Nikola Mrkšić
08:03 – 08:20
Right? I think with, have you noticed this thing when people, like, share their screen and they wanna show their child GPT? It’s almost like a, you know, a bit of a boomer thing now where it’s like you see their search history, and they’re not always aware of it. It tells you a lot about a person.

Oliver Shoulson
08:20 – 08:20
Yeah. It does.

Nikola Mrkšić
08:20 – 09:04
almost like. a snapshot.
It’s like, oh, this person’s, like, looking at things about, like, this health issue. Clearly, they’ve got a confidence issue with x, right, or whatever.
Like, it’s just really, really interesting. But I think, like, I was, to reveal a bit of, you know, my my my own, kinda, like, personal side.
I use a lot of voice mail. I use it nonstop.
Like and, you know, it started with because of what we do. But then, really, it’s practical.
And, my wife decided that I am to start grilling. Right? So she bought me a charcoal grill, not the one with, like, gas and stuff.
It’s still charcoal, and I’m not very good at it. Right? So voice moto, and I’m like, okay.

Oliver Shoulson
09:04 – 09:05
You have a. difference.

Nikola Mrkšić
09:05 – 10:18
How long? This? When? Over the hot side or not? Like, when? Okay. This looks bad.
This is not hot enough yet. It’s starting to burn.
What should I do? And the one thing that’s, like, really interesting is, like, I think for the first five minutes, my wife was entertained by my, like, you know, vicious, you know, opinionated usage of this thing where I’m like, no. Stop.
Short sentences. This is like Jesus Christ.
Like, when AI takes over, you’re gonna be among the first. I was like, no.
Because I’ll you know, I’m like, I’m engaging. I’m like, unlike the rest of you.
But it’s like, you need to be extremely opinionated about how you want it. Because, like, I’m like, I don’t need the, oh, I got it.
That’s totally the right way to do it, I was like, no. Stop.
Like, no. Like, 150 or 200 degrees or what quickly? I was like, no fluff.
Okay. I’ll be more curt.
And then, like, it got into, like, a vibe that I was okay with. It still just, like, won’t stop talking.
Like, it. sends a verbose, and, like, it hinges on, like, spitting out tokens on a screen fast enough that you, like, kinda, like, scan it and have attention over the things that matter most.
But even when I use it as text, like, it pisses me off that it can’t just interrupt easily there. So with voice, that’s actually more natural.

Oliver Shoulson
10:18 – 10:18
Yeah.

Nikola Mrkšić
10:18 – 10:25
but it makes for quite a weird interaction where you have to be an active interrupter in a way that you never would with another human.

Oliver Shoulson
10:25 – 10:26
Yeah.

Nikola Mrkšić
10:26 – 11:11
And, yeah, after some time, you know, I think ten minutes later, I’m there, sweating, and, you know, I didn’t set anything on fire. My wife’s like, are you done talking to your girlfriend there? I’m like, you know, the one key thing that Apple hasn’t given them is an ability to say stop listening.
You stop listening, it’s still listening. Right? And I’m just like, I’ll say, like, h I g p t again if I had to.
You can’t yet. You could start it through Siri, but the fact that you can’t stop it was genuinely the reason that I then was like, okay.
As I take off the grill glove, do I, like, for another signal, or do I not? I’m just gonna chance it. So, like, the UX is getting there, but I feel like I am not a representative.
I am a power user, and this is just a weird thing. But I’m surprised you’re not using it more.
It’s like, Sean. would have updated now.
You know?

Oliver Shoulson
11:11 – 11:21
Yeah. I mean, maybe I should give it I I I should give it more of a chance.
But, I mean, the grilling example you gave is such a potent example of, like, two people engaging in a task based.

Nikola Mrkšić
11:21 – 11:21
Yeah.

Oliver Shoulson
11:21 – 12:34
cooperative dialogue where it’s doing exactly these things that can be so irritating with robots that humans don’t do, which is that there’s so much presupposition, so much, so many ways that we, like, implicitly index other parts of the conversation, and don’t make them explicit. And, you know, every response or question that your counterlocutor produces is assumed to be toward a mutual goal that you have and doesn’t require that additional explanation.
And, again, these language models are trained to be over explanatory because it helps with their predictability. It helps you as the user understand the relationship between your prompt and what it output.
But it’s not the way that humans do things. You know? If I don’t, if we’re working on setting up a grill together or building a piece of IKEA furniture or something, I’m not gonna tell you on every turn, you know, okay.
In order to attach the legs, now you need to put these screws in. I’m just gonna say, alright.
Next, you should put these screws in. Because it’s like it’s like we understand what we’re trying to do.
We’re trying to build an IKEA desk. We’re trying to.
grill some burgers. And yeah.
So I think I think that. That really gets to the heart of the issue of plugging in LLMs to these kinds of use cases.

Nikola Mrkšić
12:34 – 13:04
Yeah. Yeah.
And, I mean, look, I think that that whole, like, task oriented thing, like, the way that they could be a lot better is by injecting that context. Right? And I feel like it’s not that they forget.
That would be, like, criticizing the model, and the model is very powerful. I think it’s more just, like, how the context is passed and orchestrated.
And the other day is probably, like, a bit of legal liability as well, where they’re just very careful to not be opinionated or, you know, like, they have to qualify everything so that you end up.

Oliver Shoulson
13:04 – 13:07
for granted. Yeah.
Yeah.

Nikola Mrkšić
13:07 – 13:24
Where do you think it goes longer term? And, you know, I think that, like, you know, you’ve had a lot of experience with implementing LLMs first off with kinda like OpenAI’s models, now with our own. Kinda like, why is it important to have your own LLM? Or is it?

Oliver Shoulson
13:24 – 14:40
Well, I mean, I think that I think that one of the most amazing things about having the in house LLM that’s specifically trained to work really well within our own proprietary tech stack and control layer is, like, what is, like, what’s really the key there, which is that, like, you can plug a a GPT LLM an open AI LLM into our agent studio, and it will be fine. But it’s not, like, specifically trained and fine tuned to, like, know exactly what kind of prompting it’s going to receive and, like, what like, how it should behave in response to that prompting.
So that’s a really cool thing about having the in-house LLM. There’s also just that we get to then fine tune it and train it to, specifically counteract some of these more, like, less cooperative, not speech oriented tendencies, things like being too verbose, over explanatory, things like failing to, you know, reference the speech participants in ways that humans very naturally do, you know, saying things like, could you read me your account number instead of please provide your account number.
Right? Like, these kinds of subtle bits of syntax of grammar that appear in human dialogue where that actually let the speech participants be encoded into the language in ways that, like, LLMs are kind of I I would guess are kind of trained not to do because, they’re meant.

Nikola Mrkšić
14:40 – 14:40
is.

Oliver Shoulson
14:40 – 14:41
sort of a system.

Nikola Mrkšić
14:41 – 14:52
but it’s such a good point. It reminds me of another conversation we had.
If you remember, when we looked at a Hebrew implementation, and it was like, I need to know your gender.

Oliver Shoulson
14:52 – 14:52
Yeah.

Nikola Mrkšić
14:52 – 15:05
to see, like, what is the way it is, like, right at the end. Like, that was just, like, an example of, like, okay.
Like, best off using a third party. passive.
What is the value of this? Not like, when did you,.

Oliver Shoulson
15:05 – 15:05
Yeah. Yeah.

Nikola Mrkšić
15:05 – 15:31
do right? And another one, like, in Serbian. Right? Like, as well, you can get out of a lot of linguistic trouble by using, like, formal tenses where, you know, use plural instead of, like, gendered endings of verbs and stuff.
But, yeah, that’s such a good point. Although, all of that is doable technically, especially with, like, the consumer oriented, model that knows a lot about you.
Right?

Oliver Shoulson
15:31 – 15:49
Yeah. And maybe also as, like, we get these we get better with these, like, direct speech to speech kind of real time models, there will it will have the ability to, like, take a guess at, you know, how it should be conjugating your gender in ways that, we can’t currently when there’s, like, an ASR layer in in between.

Nikola Mrkšić
15:49 – 15:51
Yep.

Oliver Shoulson
15:51 – 16:02
Though, I mean, I guess, like, this happens on the phone all the time. Like, I don’t have the deepest voice in the world.
Like, I’m sure I’ve been called ma’am over the phone before, and that kind of thing happens. And so.

Nikola Mrkšić
16:02 – 16:03
That’s,.

Oliver Shoulson
16:03 – 16:03
it’s at risk.

Nikola Mrkšić
16:03 – 16:03
know,.

Oliver Shoulson
16:03 – 16:03
as well.

Nikola Mrkšić
16:03 – 16:46
a, I think, you know, I went through, like, a very serious heavy metal phase when I was, like, 13, 14. And, there was an older Serbian woman in the bus who was like, sir, girl, could I have that seat? And I looked at her.
I was like, of course. When I went into the first hairdresser and, you know, like, near shaved my head.
So it happens, happens in many shapes and forms. But, I think, you know, when you just think of, like, the use of these LLMs and, like, how they need to evolve, as you prompt and others prompt around their kind of, like, current limitations, What, like, annoys you the most in a daily, like, kinda like workflow implementing an MLM powered agent for on on behalf of enterprise?

Oliver Shoulson
16:46 – 17:43
You know, I think it really is these just sort of like as someone who cares very deeply about the user experience and about, like and knows that both from a usability standpoint and from this sort of social presence standpoint, which has these kind of mushier positive impacts, things like trust in the system and satisfaction. and and and, advice adherence, likely to adhere to the advice of the agent, like, these little quirks of persona or of speech style, are really hard to prompt out of them.
And, like, like, one example that, you know, I spent a whole afternoon trying to find the prompting for is, like, doing a doing a step by step walk through where what if you tell an LLM to walk through someone, you know, changing their account password step by step, after every single step, it will prompt the user for confirmation or prompt them to tell them that they’re ready for the next step. Like, this is not how humans do this.
Like, if I was walking. you through, I would say, you know, first go to your account page, and then I’m done.
talking.

Nikola Mrkšić
17:43 – 17:44
Yep.

Oliver Shoulson
17:44 – 18:13
I know that you’re gonna tell me when you’re there. Like, I don’t have to be like, let me know when you’re ready for the next step.
And what LMS will do is go to your account page and click account settings. Let me know when you’re ready for the next step.
And then you say, okay. I’m ready.
And then it’s like, okay. Then go to your account details and click on change password.
Let me know when you’re there. Like, like, this is so annoying, and it’s something that they.
I really want to do it because, again, they’re trained to be this, like, text based, assistant kind of persona, not like a collaborative partner in accomplishing a task.

Nikola Mrkšić
18:13 – 19:06
You know, like, this really reminds me of, you know, when I started like, for example, this is my wife just, come easily to me today. Her favorite well, favorite one of her jokes about me is that I sound like, you know, I learned English from a book, which is, you know, not that untrue, but it’s like fancy words that non native speakers that have spent a long time learning a language deeply without, like, immersion use.
Similar, like, LLMs almost use this, like, written formal casual conversation, which is, oh, that’s it. You got it.
I’m like, I don’t need, like, PEP help. Like you know? So I’m like, if I whatever ever that and then oh, that’s exactly right.
You did that. Right.
That’s a good way to think of it. I was like, oh my god.
I’m like, with every second, my life drains out of my body, and you lose. I lose trust in an ability to have a conversation that’s not gonna lead.

Oliver Shoulson
19:06 – 19:21
period of, like, of, like, a few days where Grok was being, like, way like, was being, like, supplicatory and was, like like, telling everyone, like, you’re the smartest person in the world. Like, amazing.
Good. There was, like, a period where.

Nikola Mrkšić
19:21 – 20:15
So I don’t use it much. Right? I think I got a subscription at the point where someone said, well, they said that they were the best, and I tried it out.
And, you know, I think the deep research mode was very good at the time, but I’ve not, like, continued to evaluate. When they released those voice avatars, there was, like, I think, like, the inappropriate Rudy version, which was just like a red squirrel that just swore back at you and, like, offended you greatly, where I’m like, well, at least it’s, like, an interesting unhinged model.
Right? Finally, it was so funny and stupid and, like, 12 years old in spirit that, honestly, I don’t think it would lead to any self harm other than, like, laughter and, like, well, lack of usage because it wasn’t that good. But, yeah, I don’t know what they were really trying to achieve there other than, like, maybe, you know, they were doing it for the lows.
Have you used many of these? Like, what, what do you use in your daily life in terms of kinda, like, which LOM?

Oliver Shoulson
20:15 – 20:27
I’m mostly using GPT, OpenAI. It’s just what I have a subscription to, though.
For coding, the claud models are really good.

Nikola Mrkšić
20:27 – 20:30
Yeah.

Oliver Shoulson
20:30 – 20:50
You know, I like the I sort of I’m I’m curious to know a little more about what’s going on behind the hood, under the hood of the, like, thinking and fast GPT five models, and, like, how it decides to, when you set it to auto, like, how it decides to think longer about certain questions. But, generally,.

Nikola Mrkšić
20:50 – 20:52
I think yeah. I feel.

Oliver Shoulson
20:52 – 20:52
very.

Nikola Mrkšić
20:52 – 20:53
feels good.

Oliver Shoulson
20:53 – 20:53
Yeah.

Nikola Mrkšić
20:53 – 22:30
Right? It feels very handcrafted, and I think that there’s a part where it just, like, triggers one of them. It’s just a classifier.
I was hoping they would unify it. I was in awe of, like, the whole, like, four is out.
We’re gonna use five, and we’re gonna make it good. And, you know, I think OpenAI is famous for their, like, work tempo and dealing with pressure.
They caved in here, and I’m not sure that that’s a decision they’re gonna look back on fondly because back propagate. The spirit of deep learning is through data, one model, back propagation, forcing yourself to grow and the UX to improve.
And they’ve kind of chickened out, and now it’s just like this, like a toolkit with 20 different things. And, yeah, maybe it selects it, maybe it’s not.
They’ve already allowed a lot of power users to opt out and use four o or different models, and, like, that just means linear growth on many fronts. And that’s a real shame.
I feel like it was you know, I remember I was, one of the opening at launch parties where Ilya Sutskavor was telling me that, you know, my approach to deep learning research was almost more cowardly because there was structure in how the model would be structured and learned rather than, you know, just one model data and, like, let it rip. And, you know, it stuck with me because the guy told everyone exactly what he’s gonna do and how he’s gonna get there, and he did.
And now I feel like with the whole, like, divergence of multiple models, they’ve they’ve kind of gone towards looking at this massive toolbox. And there’s everything in it, but the discovery problem is you don’t know what it can do.
So if one improves, like, the time it takes for everyone to figure out how to use it, it’s gonna be exponentially longer, and that’s just a shame.

Oliver Shoulson
22:30 – 22:39
Yeah. How do you find do you ever interact with, like, in voice mode? Do you ever interact in Serbian with it? Are you only using English?

Nikola Mrkšić
22:39 – 22:39
all the time.

Oliver Shoulson
22:39 – 22:39
Oh,.

Nikola Mrkšić
22:39 – 22:47
and. it’s it’s uncanny.
Right? So the flip is very good. The voice is phenomenal.

Oliver Shoulson
22:47 – 22:47
Wow.

Nikola Mrkšić
22:47 – 24:06
There’s dialectic ability where I’m like, alright. Like, you know, I would gently flip towards more, like, say, the way my, like, grandparents from, from, like, a Serbian part of Croatia would speak, and they would, like, gently interweave a different dialect.
Like, it’s basically how, like, the old Slavic jot turned into, like, e or iya, whatever. So it’s like.
And I would suddenly just flip a few words, and over time, it would start flipping them. But then and it’s like, you know, if you can follow a dialect implicitly, you can just, like, flip it and be like, correct.
Like, you know, you say, like, a few words that are by Croatian, not Serbian, and then it will, like, flip to more of a, like, zagrata rather than a Belgrade sounding thing. It is incredible.
Right? And it just shows you how much data has been trained on and what’s implicitly gone into the parameters. So whenever we think, like, do we need smaller models? Do we need all that? I’m like, there’s some massive part of that brain that has, like, you know, a big part of the parameters dedicated to, like, you know, the 100 kilometers of region in Former Yugoslavia, the difference.
Right? And, it’s clearly good. It’s not perfect, but, like, about six to twelve months ago, it was making grammatical errors that it doesn’t make at all now.
So I think prob.

Oliver Shoulson
24:06 – 24:26
Yeah. That sounds great.
Because I’ve just recently sent a, like, a GPG translation into Italian for a project I was doing, and then I sent it to one of our Italian dialogue designers to just, like, to look it over. And she was like, this is way too literal.
Like, this is not how people talk. I was really surprised that it gave me a bad Italian translation.

Nikola Mrkšić
24:26 – 24:57
But what was the right context provided? Right? Because I think, like, there, it’s, it’s again, like you know, I think it’s quite easy, and I think there are a lot of, like, mockingbirds out there for, like, I did that. And I’m like, man, it’s, like, able to rewrite it in, like, Serbian nineteenth century epic poetry style things that I know human beings are incapable of.
So ridiculing the fact that one thing there ends in the wrong, like, ending or something, it feels to me like we’re, like, missing the bigger point of, like yeah. The world’s changing.

Oliver Shoulson
24:57 – 25:12
Right. Well, I mean, it feels like given the volume of data that was fed into.
it sure as hell better be able to do some of this stuff. It sure as hell better be able to do things that humans can’t do if considering the amount of data it consumes would take, like, a hundred million years for a human to consume.

Nikola Mrkšić
25:12 – 25:35
seconds. Alright.
So so maybe, like like, just kinda, like, drawing it to a close. In five years’ time, you called the, you know, companies and, you know, systems are running, you know, ideally mostly powered by us.
But what’s the, like, UX outlook on that agent? Does it tell you it’s AI? Do you just know? Because most of it is. Like, how does it greet you?

Oliver Shoulson
25:35 – 27:14
You know, I think I I’ve heard you talk about the idea that, has stuck with me for a while, and I don’t know if you still, if you still support this of, like, having a separate ringtone that, like, lets you know in advance if it’s gonna be an AI agent. And that way it doesn’t interrupt the flow of the conversation at all.
But there’s just like general literacy out there about like what this means what that means, you know. I think in general so many of the problems we’re seeing around Ai and like both both in terms of the actual harms that it causes in society, and also in terms of negative user experiences people have interacting with it has to do with like literacy and understanding about like what is what these things are, and what they can do, and how they’re doing what they’re doing.
And so if there’s a greater degree of literacy and understanding around like, Okay, this ringtone means like I’m gonna be interacting with a large language model. I have experience doing that.
I know what the rules of this conversation are like. We don’t want people to be consciously thinking about that because that’s what makes an interaction.
frustrating. But, like, I think that that’s a great way of doing things where, like, you don’t you don’t interrupt the flow of the conversation, but there’s sort of implicit understanding and consent there.
You know, I think that they should, like, from the usability side, like, I think just I think feeling more human is important, not for its own sake. Like, there’s no there’s no we don’t win any gold medals just for, like, making the most lifelike sounding agent.
Like, really, I think what we’re talking about when we talk about a desire to make things lifelike and a desire to make things human is to make them usable. Like, there’s just a degree of affordances offered by human to human interaction.

Nikola Mrkšić
27:14 – 27:14
Yep.

Oliver Shoulson
27:14 – 27:57
that it’s still not offered by human virtual human computer interaction. And so we want to make systems that offer those same degrees of affordances that can understand the density of information encoded in every single thing that you say, and can cooperate cooperatively.
respond to it. And so I think that’s really what we’re looking for in the next few years of AI.
And I think that that will come with the sense of them feeling more lifelike and feeling more human. But again, not with that being the goal in its own right.
Its goal is to make them usable. The goal is to make them feel trustworthy and to be trustworthy in how we design them.
And the goal is. to make, yeah, people have people who have good experiences in directing with.
them.

Nikola Mrkšić
27:57 – 30:54
100%. And I think, you know, like, the the whole notion of, like, what they will be like in the end is they will be superhuman and that voice clearly dominates in a lot of these interactions still because we’re better at expressing ourselves and we, you know, we are taught from a young age to express ourselves with our voice.
We’re fast at it. It packs a lot of information.
The. The tone you say, the words you choose all really matter.
Right? And it’s synchronous. It’s instant.
I think there is a part and, you know, there’s a lot of research that we look into and do ourselves on, like, Gen z especially. And, you know, Gen z just because it’s, like, the most, well, it’s the newest generation entering the workforce heavily.
Right? But, you know, it was true of, like, millennials and to a different extent, a bit of Gen x. Right? Like, the adoption is different, but, also, like, the whole premise was that they don’t wanna call, and that’s not true.
But it is true that, like, there is more trepidation about these experiences. And I think there’s a very clear, like, provable point that says they’re not it’s not about voice or non voice.
There’s a bit about the social anxiety of it, like, do you wanna speak to a human or not? And it’s almost like this tiredness of, like, hello. Good day.
How are you? And, you know, I’ve always had this, like, very interesting, it’s like cultural differences. Right? I’ve spent half my life in The.
UK, but you know, I feel like maybe serving culture is a bit more transactional direct where I’m like, I’ll send someone a message in the morning. Hey, where was that thing? And, like, from Brits in particular who are really big on manners and they’re the, well, of the people I’ve interacted with at scale, they’re definitely the most polite.
The answer would often be from people who are, like, a bit shaken by, like, oh, they’ll be like, good morning. How are you? I was.
like, sorry. No.
No. I didn’t want to chitchat.
I just wanted that. Right? And I think that was, like, always really interesting.
And, you know, I was on a call with a bunch of Serbian clients for once, but which is not very frequent. And I think two bits in the call with me were just like, someone started sharing the screen and talking.
I was like, you see? This is like a feature, not a bug. I’m not the only one.
But what was interesting is, like, I get to do that with voice, with an AI, in a way that’s, like, a bit, like, on the phone. Oh, how are you doing this and, obviously, I’ve been, like, anglicized partially, and I’ve learned to, you know, go through that, like, a trivial bit in the conversation, but you don’t have to.
Right? And. then it’s just really fast, and it’s like sport mode, like, back and forth, like, real quick.
I think that’s really for those that want it, they can have it. But, equally, I think there’s a bunch of people who really appreciate the softer side of just, like, being able to chitchat and stuff.
And what’s funny is that they wanna do that with a bot as well, even though they’re not really getting that emotional exchange with another human being.

Oliver Shoulson
30:54 – 32:04
Yeah. Well, I mean, really what you’re talking about here are, like, interesting accessibility considerations.
And it’s funny because, like like, this is something that I remember, like, in my first week at the company, you know, over three years ago or whatever, was talking about, and was getting kind of laughs in response to, like, oh, you wanna design, like, more accessible conversation styles and, like, more adaptable conversation styles? But, like, actually, like, this thing goes beyond like, there’s the sort of cultural differences you’re talking about and, like, the different ways that people like to interact. And one one thing one advantage that LLMs and computational systems have over people is that they don’t have these ingrained personalities.
Like, they can sort of flip and switch between what the preferences are. And, sort of as you said, there are gonna be people who and the research continually shows this, that even if they know it’s a robot, they’re going to assign it, you know, human characteristics, like wanting, you know, wanting to do small talk, wanting to be respected, wanting to have polite interaction.
And, you know, we’re certainly seeing that with Chad GPG too, with all these stories in the news of, like, people falling in love with their voice assistants. And, like, despite knowing full well that it’s not a person on the other side, like, we can’t help but assume this intentional stance with respect to these systems.
We can’t help but.

Nikola Mrkšić
32:04 – 32:37
What could be the reaction I’d like for being pulled? It’s like, bring her back. her back.
It’s just like, you know, like, the thing is the first time yesterday that I saw, like, someone posting something without it being, like, everything just about four o and, like, how they, like, changed it from away from, like, what they’ve gotten used to. And, it’s insane.
It’s kinda like, you know, yeah, then I think I think we’re now done seeing the interesting effects of it, and we’ll see what society will look like in ten years from now. Cool.
Well, on that awesome note, thanks for joining me today.

Oliver Shoulson
32:37 – 32:38
Yeah. Thanks.

Nikola Mrkšić
32:38 – 32:38
a pleasure.

Oliver Shoulson
32:38 – 32:40
having me. Nice to talk to you.

Nikola Mrkšić
32:40 – 32:45
Yeah. And to everyone watching, please share, like, subscribe, and we’ll see you in the next one.

About the show

Hosted by Nikola Mrkšić, Co-founder and CEO of PolyAI, the Deep Learning with PolyAI podcast is the window into AI for CX leaders. We cut through hype in customer experience, support, and contact center AI — helping decision-makers understand what really matters.


Never miss an episode.

Subscribe