Table of Contents
As someone who has a difficult-to-spell first and last name (thank you, mum and dad), having to give my name or email over the phone has always been a nightmare. From hotel reservations to doctors’ appointments, whether I’m talking to a human being or an automated system, I’ve probably seen every possible spelling of my name. And while sometimes it’s just funny, it often comes with the risk of missing important communications due to an avoidable error. I’ve always wanted a better way to communicate in these situations, and thankfully, now there is: multimodal.
What is multimodal communication?
PolyAI’s multimodal capability brings together multiple modalities of communication into one conversation, which means using different communication channels together when it makes the most sense. Think of using voice, text, and multimedia assets together to benefit from the strengths of each at the precise moment they’re needed.
Let’s take a look at a couple of real-world examples.
Example 1: Texting rather than speaking
We can all relate to the monotony of setting up new accounts and appointments with service providers in healthcare, utilities, and telecommunications. For those who prefer to do this over the phone, it usually involves answering a laundry list of questions, repeating details, and confirming various spellings. Adding in additional challenges like background noise and strong accents takes this frustration to the next level. But what if there was a better way?
While voice is flexible enough to be a preferred channel across many use cases, it isn’t always the most optimal. When submitting key pieces of information like names, emails, and addresses, information has to be collected with absolute precision. This likely requires repetition, regardless of whether someone is speaking to an agent or automated solution.
With this headache in mind, submitting the information via a form becomes a much more efficient option.
The power of multimodal communication means that customers can use voice and text simultaneously, engaging with the channel they prefer at that very moment. The AI agent can begin the conversation over the phone, send a text with a form for the customer to submit their details, process that information, and resolve the interaction without needing to escalate it. The customer can even ask questions while completing the form if they run into issues.

Example 2: Using images
A large part of troubleshooting an issue is diagnosing what the problem actually is. Instead of describing which lights on the router are blinking or what color something is, it’s much easier to just send a picture of the issue. This is also true for use cases like reporting a delivery as damaged and requesting a replacement.
Today’s AI agents can understand images and process information from them. This capability allows for greater self-service and increased efficiency as customers don’t need to provide long-winded explanations of their issue or be escalated to an agent. Instead, they can simply provide an image and continue the conversation. The AI agent can even talk them through what the image needs to show to ensure it’s compatible.

Example 3: Sharing location
Have you ever had to describe your location to someone when you were in an unfamiliar place? Now, think about doing that with the additional stress of your car breaking down. It’s understandably challenging to give precise information when you’re placed in a stressful situation. Multimodal capabilities offer another resolution here.
Instead of customers describing the breakdown location to roadside assistance, they can begin the conversation with an AI agent and simply drop a pin to share their location. This functionality gives distressed customers peace of mind that the location they’ve provided is correct and allows for a more efficient issue resolution.

A better way to have a conversation
For years, customers have been promised automated, omnichannel solutions that let them engage with businesses across channels and remember context. As outlined above, we’ve all experienced phone calls where it would be easier to text an answer or send an image instead of trying to explain it over the phone. In these scenarios, rudimentary automated solutions break down, and customers are escalated to an agent, where they face the same communication challenge.
PolyAI’s multimodal capabilities resolve these long-term pain points. Customers can engage with our AI agents on the channel they prefer, switching at a moment’s notice and sharing any multimedia asset that can help resolve their issue. The end result is more natural and freeing conversations for customers that allow you to increase the scope of automation while boosting satisfaction with efficient interactions.
Learn more about how multimodal fits into PolyAI agent capabilities and book a demo today.