Why seem human? Some theory behind voice assistant design

It’s easy enough to say that a Conversational User Interface (CUI) should feel, well, conversational. But beyond showing off our state-of-the-art natural language processing/generation technology, what’s the point of trying to make a human–computer interaction feel like a human–human interaction? And do the potential benefits of a humanlike conversational experience only come into play if someone truly believes they’re speaking to a live person?

Computers are social actors

Since the early 1990s, the Computers as Social Actors (CASA) paradigm has held strong within the field of human-computer interaction (HCI). The idea, demonstrated in study after study, is that humans effortlessly and automatically apply principles of human sociality to their interactions with machines. This goes beyond simply treating the machine as a proxy or vehicle for the person who programmed it; rather, we genuinely experience our computer counterparts as fellow social beings with personalities, attitudes, and intentions and respond to them in kind. That’s pretty remarkable, considering it runs against everything we consciously know and believe about our computers and other devices.

To understand what makes CASA possible, it’s helpful to adopt a popular distinction in social psychology between systematic and heuristicevaluations. When interpreting and judging an external message, humans tend to rely on cognitive shortcuts and rules-of-thumb (heuristics), rather than deploying the brain’s most intensive resources toward systematically analyzing the content and context of a message. Some examples of common heuristics are:

Length Heuristic: We believe that a longer message is more thorough and, therefore, more reliable.
Promptness Heuristic: A response that comes quickly is more helpful than one that comes after a delay.

The MAIN Model

We carry a grab-bag of these heuristics in our cognitive back pockets each with the potential to be triggered quite automatically by different cues we encounter. This wide-ranging model of human thinking has been adapted in the HCI-literature into the MAIN model. The MAIN model identifies four facets of technology that can act as cues, triggering users’ heuristic judgments. These judgments, in turn, have a huge qualitative effect on a users’ experience of the technology and ultimately determine how much they trust it.

Modality: cues pertaining to the means or medium of the message

E.g., Print Heuristic:

Print media are more serious than digital media

Agency: cues pertaining to whether or not the message came from a conscious source

E.g., Bandwagon Heuristic:

If another person believes this message, then I ought to

Innteractivity: cues pertaining to if/how a user can influence their own experience

E.g., Choice Heuristic:

If I can choose what information I receive, information isn’t being hidden from me

Navigability: cues pertaining to the organization of information in an interface

E.g., Prominence Heuristic:

More prominent messages are more important

Cues and conflicts in conversation design

To a conversation designer, the MAIN model and its bevy of accompanying heuristics might look like a guidebook for crafting good virtual interactions: all we have to do is maximize the number of positive heuristics triggered in our design and we’ll get the best results from our users!

Well, not so fast: often, seemingly opposite design choices can cue similarly desirable or undesirable heuristics. Consider choosing between a robotic or human-like voice for a voice assistant. A robotic voice might have the desirable effect of triggering the Machine Heuristic, whereby information that seemingly derives mechanically or automatically is seen as more objective and reliable. On the other hand, a human voice triggers the Social Presence Heuristic, whereby the sense of “being with another” produces a greater desire to engage and results in an overall positive attitude toward an interaction.

Keeping it human

Resolving these conflicts is the bread and butter of PolyAI dialogue designers, and while each designer brings a unique outlook to the problems of building natural interactions, we are ultimately guided by the principle of building customer-led voice assistants, which center around the human experience and allow the uniquely human aspects of conversation to guide the interaction. Prioritizing humanness doesn’t just sound nice, but is also backed thoroughly by research.

Like heuristic thinking, our ability to anthropomorphize the machines we interact with is something that happens mindlessly and doesn’t require a sincere belief that the computer is a person. Nonetheless, across the literature, anthropomorphism is linked to an array of positive attitudes toward the technology: perceptions that a device is useful, trustworthy, enjoyable>, and even an increased intention to use the technology.
Cues across the MAIN model categories can act as triggers of anthropomorphism. The fact that PolyAI voice assistants meet the needs of our users through turn-based (i.e., back and forth) conversation triggers both anthropomorphism and positive heuristics associated with interactivity.

Our principle of beginning every conversation with an open “How can I help?” instead of listing our capabilities triggers both anthropomorphism and the agency-driven helper heuristic, whereby a device presented as a willing helper induces a positive affective response in the user.

Using a human voice and lifelike custom intonation triggers anthropomorphism and positive heuristics associated with modality, inspiring confidence in the advanced capabilities of our technology.

Customer-led conversations leverage these anthropomorphic cues while also not relying on convincing every user they’re speaking to a live person; the positive effects transcend that belief, allowing us to ensure a customer-led experience with every call.

Bio: Oliver Shoulson got his B.A. in Linguistics at Yale University and subsequently worked in New York University’s department of Communicative Sciences and Disorders studying child language development; his publication venues include the Yale Undergraduate Research Journal and the forthcoming Oxford Guide to Australian Languages. At PolyAI, Oliver applies linguistics and conversation analysis to design better, smoother, and more reliable user experiences.

Why seem human? Some theory behind voice assistant design

Computers are social actors

The MAIN Model

Cues and conflicts in conversation design

Keeping it human

Read more

Ready to hear it for yourself?