In late July, two of our Dialogue Designers (Harry and Isobel) represented PolyAI at UNPARSED 2023, the world’s first Conversation Design conference. This was a fantastic opportunity to hear industry experts’ perspectives about conversational voice AI and design in lectures, panel discussions, and chats over coffee.
Below, Harry shares the six overarching themes they identified during the conference.
1. Serve the user, not the tech
Unsurprisingly, given the dramatic improvements in Large Language Models (LLMs) over the last year, conversation designers across the industry are looking at how to harness their power. Our approach, however, must be human-centered rather than tech-centered. We cannot start with the question of how to incorporate a given technology into a product. We must instead first ask what our customers’ needs are and how we can better meet these and then determine what technologies are appropriate for the use case.
For example, current iterations of LLMs are known to produce hallucinations: confident but completely false assertions. For many enterprise voice assistants, the risk of hallucination is high enough that using LLMs to generate live responses is simply untenable – imagine a banking assistant giving incorrect guidance around fraudulent payments.
If we start by identifying user needs, we find that there are less obvious but more useful applications of LLMs today, such as:
- Replacing or supplementing traditional intent-based Natural Language Understanding (NLU) systems – this makes it easier to incorporate more conversational context into our NLU rather than focusing only on what the user said in a single turn
- Clustering unhandled user queries into topics to identify where we can add new design
- Text-based sentiment analysis of live calls to find user journeys that lead to lower customer satisfaction and, therefore should be redesigned
2. Serve every user
As conversation designers, we sit between customers and AI algorithms; we, therefore, have an ethical responsibility to safeguard our end users. This is particularly pertinent where customers have specific accessibility requirements. For example, an agent that repeatedly says, “Sorry, I didn’t understand, can you try again?” to a customer with a speech impediment may cause severe user distress. We must do all we can to make sure we understand every user and always give the option of transferring the call to a human agent if we cannot.
This is something we as designers must have at the front of our minds from the very beginning of a project’s scoping. Daniel Fraga (Accenture Song) suggests having the whole team sit down at this stage and thoroughly consider the worst-case scenarios, then design the agent to safeguard against these risks. Thinking about what could go wrong teaches us how to get it right; by making an effort to serve each and every one of our end users, we make the whole User Experience (UX) better for everybody.
3. The best UX is a problem solved
Designers can be guilty of trying to build systems that “spark joy” rather than ones that genuinely serve customer needs. In the worst case, this leads to gimmicky agents. A retail virtual assistant that has the best voice quality in the world and can tell hundreds of jokes is no use if it cannot track customers’ orders.
Nat Walker (Vodafone UK) put this brilliantly in her three design laws, paraphrased below:
- Do the thing
- Do it quickly
- Add delight
Only when one step is fully addressed can we move on to the next; User Experience is first and foremost about users, not experiences for experiences’ sake.
4. Don’t run before you can walk
Conversation designers from across the industry spoke of times when they were called to save a chatbot or voice assistant with undiagnosed poor performance. A recurring theme in these cases is overly ambitious scope: agents designed with hundreds of intents – leading to subpar NLU – and nearly as many FAQs or flows – leading to too much design effort being spent on answering questions that are rarely asked, and not enough on the highest volume requests.
In the initial build of a new agent, we have found that less can be more – a better approach is often to identify somewhere between 5 and 20 high-volume use cases, design your agent to handle these effectively, efficiently, and gracefully (in that order!), and transfer all other calls to live agents. Once our agent is live, we can use LLMs to cluster additional topics based on user input and thus design new behavior to handle more calls.
5. Voice is still king
At Unparsed, we loved seeing demonstrations of multimodal conversational voice AI systems – voice-activated assistants with screens; smart watches; household robots, etc. In these cases, the visual and/or haptic feedback tends to be supplementary – first and foremost, people choose to speak. We have consistently found that people still prefer to call for customer service applications.
Voice is, of course, harder to get right, as a voice assistant does everything a text-based assistant does but needs at least two additional layers: one for converting the user’s speech into text via an Automatic Speech Recognition (ASR) algorithm and another for converting the agent’s text back to audio via a Text To Speech (TTS) algorithm. For this reason, Hans Van Dam (Conversational Design Institute) suggests that if unsure, we should build a voice application first and then add text capabilities later on – it’s much harder to do this the other way around.
6. AI can amplify humanity
As the capabilities of conversational voice AI systems increase, there is a concern that AI will be more and more humanized at the expense of our lifestyles and livelihoods. These fears are legitimate but should be seen as a reason to build systems with ethics at the forefront, not a reason to fight against technological progress and innovation. Implemented ethically and effectively, with the user-centered, AI has the power to support and amplify human voices, rather than AI itself being humanized.