Nothing has the potential to kill a conversation quite like silence. Maybe that’s why it’s deadly! However, a hallmark of any good voice assistant is its ability to handle any behavior thrown at it, including users remaining silent rather than providing the expected verbal response. No ‘uhh’s, ‘ahh’s, or ‘umm’s about it! Literally. How do we, as designers, ensure our voice assistants remain robust in the face of silence as part of our customer-led experience?
First of all, before we can consider what silence means for a voice assistant, we need to think about why silence occurs and what it means in human-human interactions. Silence can be seen as a linguistic message sent by an “intelligent” human. It’s thought that silence serves a variety of functions, depending on the cultural context; it can be used to question, promise, deny, warn, threaten, insult, request, or command. Furthermore, its interpretation varies depending on the relationship between those involved in the dialogue in which silence is used. For instance, silence could be seen as unintentional when used in relationships in which there is an asymmetry in power, such as between a teacher and a pupil, but intentional and defensive if used in a situation where one party needs to decide what to say and chooses to remain silent, as occurs in the increasingly popular dating phenomenon of ghosting.
Silence in human-robot interactions
This understanding of silence is, however, based on human-human interactions. What does silence mean for human-robot interactions?
Humans generally feel most comfortable with robots when they are anthropomorphized. As Lotte Willemsen said recently at the European Chatbot & Conversational AI Summit, we anthropomorphize robots because we as humans feel a deep desire to connect with other humans, and humanizing technology makes us feel in control. My colleague Oliver discusses this in his blog post, Why seem human? Some theory behind voice assistant design. We would therefore expect humans to expect the same behaviors in interactions with robots as they do from humans and to also exhibit that same behavior in those interactions. Indeed, Kafaee and colleagues have suggested that if a voice assistant cannot adequately use or appropriately interpret silence, it will not pass the Turing test.
It’s therefore best practice to make silence a key part of design in conversational AI and essential to understand the function of silence in human-robot interactions. If the voice assistant is to assume anthropomorphic characteristics to successfully simulate human-human interactions, then we need to assume that the functions and interpretations of silence in human conversation also apply in conversations with voice assistants.
Appropriate responses to silence
A voice assistant’s appropriate response to silence, therefore, needs to be modelled on human responses to silence, and the human response to silence is an increase in negative emotions and feelings of rejection. This is because the need for acceptance and belonging evolved as a fundamental aspect of human nature when group cohesion was essential to survival. The modern human response to interpersonal rejection is generally, therefore, an attempt to try and avoid the social exclusion and literal perceived pain that comes with rejection. One way we do this is through compensatory social behaviours, where we try to “right” a social situation to prevent exclusion. For instance, we might try to ask a question or make a statement to gain a positive response from the party perceived as rejecting us.
How we do this depends on the person, but for many people, rejection is associated with negative changes in one’s self-esteem, so it is feasible that one’s response to perceived rejection (e.g., silence) would reflect this change, such as through a change in tone that conveys feeling less confident in oneself and the social situation. It would therefore make sense for voice assistants to exhibit similar human behaviors in their response to user silence to maintain a simulation of a human social dynamics. For instance, the agent could say something like, “Hey, are you there?” in an uncertain, apologetic tone in an effort to assume responsibility for the silence and elicit a response from the user.
However, this isn’t the only solution to silence behavior in conversation design.
Diversity of responses in design
Just as is the case in human-human interactions, we’ve observed that silence seems to serve a number of functions in human-robot interactions, including expressions of confusion or not knowing what to say, distraction, and anticipation of a different outcome (such as having a call transferred). This tells us that designing responses such that they are adaptable to a variety of situations is key to voice assistant success.
An apologetic, “Hey, are you there?” might be appropriate for a situation in which it seems like the user is confused about what to do or distracted, and we want to let them know we realize they haven’t said anything, however, repeating the question posed before the silence, preceded by a concerned “Sorry” might also do the trick. It might be equally applicable in a situation where there is a technical glitch that leads to the user not having heard the voice assistant.
The design of a response to silence also depends on the persona that is developed for a voice assistant, which may be influenced by designer biases and is definitely influenced by the sector to which the agent will be deployed, as different sectors provide different user groups, each with their own unique demographics and contexts of use. Voice assistants deployed to pension funds will likely experience a different set of users than those deployed to hotels, for instance.
In other words, there’s no one-size-fits-all solution. We ultimately seek to design the optimum customer-led experience, and we achieve this through constantly adapting our responses to silence behavior to new scenarios while applying the conventions of human-human interactions to voice assistants. Through this, we’re able to meet the expectations of our users in situations where they remain silent, encouraging them to respond in kind.
Emily Jennings received her MSc in Neuroscience from the Vrije Universiteit Amsterdam and her MSt in Linguistics, Philology and Phonetics from the University of Oxford before joining PolyAI as a Dialogue Designer. Emily has academic interests in the semantic, pragmatic, and biological study of human emotions and empathy, and applies her knowledge of these topics to design customer-led voice assistants.