The capability dilemma: personifying voicebots

Learn how under-promising and over-delivering, intentional stance, and human-like use images impact user experiences with voicebots.

Oliver Shoulson Lead Dialogue Designer

Jul 27, 2023

6 min

One of the most significant factors determining how a user feels about their experience with a technology is the difference between the tech’s inferred capability and perceived capability.

Early on in an interaction with a device or interface, a user will subconsciously construct a use image – a mental picture of how the device works and what it can do. When the user discerns a discrepancy between their use image and the technology performance, there is potential for an extremely positive or an extremely negative experience.

On the one hand, if a user overestimates a device’s capabilities, they’ll be disappointed when it underperforms. On the other hand, a device might so exceed the capabilities inferred by the user that it blows their socks off and positively upends their assumptions about that technology entirely.

Under promise; over deliver

A use image is built in response to cues presented by the device and by past experiences with similar technology.

Think of a TV remote control. The use image is pretty simple. The TV remote cues the user to its capabilities by having buttons labeled with their purposes. A user combines the information provided by these cues with their prior experience with TV remotes (and all devices with buttons) and mentally models how they expect the device in their hand to behave: if I push the mute button, the TV will mute.

Naturally, if the mute button actually did nothing at all, that would be a very negative experience because the device is seriously underperforming its advertised capabilities and violating that mental model.

Conversely, if the remote integrated some new unexpected capabilities (e.g. nowadays, some smart TV remotes have a microphone to let you search for movies and shows without typing out their names), the use image is blown wide open and the user might be amazed.

In a simple application like a TV remote, it seems obvious that you should always “under promise; over deliver.” You want to keep your users pleasantly surprised by your device’s capabilities, hanging onto your most advanced capabilities as a hidden ace up your sleeve.

Complexity and the intentional stance

As a piece of technology becomes more complex, so too does our use image of it. Think of the behavior as an entire computer, which can do lots of different things with lots of kinds of input.

When the use image becomes so complex as to be almost inscrutable, people tend to fall back to a kind of default mental image, assigning agency and intentions to the device. This is known as the intentional stance.

We default to this intentional stance in our language around technology all the time. Think of the last time something unexpected happened when interacting with a computer or application and you said “Oh, it didn’t like that,” or, “Seems like it wants me to click on this.”

Those kinds of outbursts are indicative of deeper personification of our use images.

It makes sense that we would personify complex technologies, since the most complex and nuanced behavior we ever experience is that of other people, who have genuine wants, needs, likes, dislikes, etc. But personifying technology comes with its own set of risks and rewards…

The risks and rewards of a human-like use image

When it comes to PolyAI voicebots, the sky's the limit in terms of behavioral complexity.

Voicebots capitalize on human intuitions about interacting with other people and deliberately cue the user to construct a human-like use image of how it behaves. This design tactic has great potential to make automated interactions feel extremely natural and frictionless, since we interact with other people effortlessly all day.

A human-like use image is valuable in customer service applications, where people might be disillusioned by generations of underperforming traditional phone menus ( think, press 1 to speak to the front desk, press 2 to… ).

In this case, maintaining user engagement means captivating the user enough in the first turn that they’ll even give you a chance.

However, a voice assistant has potentially the greatest risk of over-advertising its capabilities. If a user has built a human-like use image of the voice assistant, they might implicitly expect the entire range of human-like capabilities.

Research shows that the experience of using a lower-performing device which fosters a human-like use image is far more frustrating than the experience of the exact same capabilities in the context of a more robotic or machine-like use image.

Even the best performing voice assistant risks fostering a use image that will lead to frustration. So how can we design to mitigate this?

Know your audience

This seems like a real design catch-22. Do we create stilted, robotic interactions that won’t risk personification, or do we create natural sounding voices to drive user engagement?

As is often the case, the answer lies in the customer. We find that the risks and rewards associated with these two options are not universally the same across every customer demographic.

Current research suggests that older adults are enthusiastic about the prospects of AI-enabled technology and that the media targeted toward them tends to oversell the capabilities of AI tech and voice user interfaces. When designing for primarily older-adult customers, it can therefore be important in some situations to be clear and candid about what we can and cannot do and provide some suggestions for tasks to complete.

On the other hand, in the younger and middle-aged crowd who may be more familiar with the range of voice user interfaces and their applications, under-inferring a voice assistant’s capabilities may be more of a risk. Here, we want to prioritize preventing user abandonment and give ourselves a chance to surprise them with performance that far exceeds IVR automations and other voice user interfaces.

Designing customer-led voicebot goes beyond allowing the customer to guide the conversation once they’re engaged, it also involves anticipating customer feelings and assumptions about our technology and adapting to them before they even pick up the phone.

Oliver Shoulson got his B.A. in Linguistics at Yale University and subsequently worked in New York University’s department of Communicative Sciences and Disorders studying child language development; his publication venues include the Yale Undergraduate Research Journal and the forthcoming Oxford Guide to Australian Languages. At PolyAI , Oliver applies linguistics and conversation analysis to design better, smoother, and more reliable user experiences.