When anyone can build an AI agent, one question matters more than ever
One question separates AI built to last from AI that only works in the demo.
Somewhere in your organization right now, someone is building an AI agent. It might be your CEO. It might be a developer on your IT team using a tool nobody has approved yet. It'll be ready by the end of the week. It'll work in the demo.
The question worth asking, and rarely asked until something goes wrong, is whether it'll work on a Tuesday six months from now, when call volume spikes, the customer on the line is upset, and your brand's reputation is in the balance.
That's the conversation PolyAI CEO Nikola Mrkšić had with Michael Chen, PolyAI's VP of Strategic Alliances, in a recent episode of Deep Learning with PolyAI . The answer turns out to be less about which LLM you picked and more about what that LLM was built to do.
By the end of this piece, you'll have a clearer way to evaluate any AI agent, including those your own team might be building right now. A single question. The kind that works in a vendor meeting or a leadership conversation, and produces an answer that separates AI built to hold up from AI that only looks like it will.
Agent proliferation is more a pressure test than a problem
Jensen Huang's keynote at NVIDIA GTC earlier this year included a number that stopped people in their tracks: $1 trillion in data center revenue through 2027, already on lock. And the people in the room thought he was underselling, maybe even by half.
GitHub's COO reported 1 billion code commits in 2025. Their run rate this year is 14 billion, and their platform's uptime is suffering due to demand.
This is Jevons Paradox in real time: as AI gets cheaper and easier to use, demand accelerates. Commentators thought the need for software engineers would slow with the advent of code-specialized AI, but job postings are spiking. Functions that never thought of themselves as technical (compliance, finance, go-to-market) are starting to build like developers.
This is largely good. The democratization of AI is worth celebrating. But it means, for the first time, you're likely to find yourself evaluating AI agents built overnight alongside those built over years. The demo won't tell you which is which. NVIDIA's most recent announcement made that gap impossible to ignore.
Sounding natural isn't the same as being useful
At NVIDIA GTC, the company unveiled Nemotron, a new speech-to-speech model. People who heard it came back with the same reaction: it sounded supernatural and impressively fluent in conversation.
It launched without tool calling.
For anyone outside AI architecture, here's what that means in a contact center. A customer calls, and the AI answers in a warm, human-sounding voice. The customer asks to change their appointment, check their order status, or update a payment method. The AI cannot reach into any other system to do those things so the call goes to a human.
A voice that can't take action is a very polite hold message.
Voice AI has two requirements at the model level: generating a good response and reliable tool calling, meaning the ability to query, update, and act within the systems your contact center runs on. Everything else lives in how the model and its surrounding infrastructure were designed. If a model can't do both, the interaction fails regardless of how natural it sounds. Nemotron is a useful illustration of a problem that runs through most of what's currently being deployed.
Most AI agents are duct tape in a tailored suit
The gap between AI that works in a demo and one that works in an enterprise comes down to whether the model you choose and the system built around it were designed for the same job
The model is the engine. What's built around it determines whether that engine is in a race car or bolted under a shopping cart.
The instructions the AI follows, the tools it can call, the guardrails on how it responds, and the logic that determines what happens when something unexpected comes are the surrounding system that decides whether your AI takes action when it matters.
Most AI agents being deployed right now are a general-purpose model wrapped in a thin layer of instructions someone assembled recently. They perform in controlled conditions and struggle at the edges, which is exactly where customer experience lives: the upset caller, the unusual request, the conversation that goes somewhere no one planned for.
The question worth taking into your next vendor conversation: Was this AI designed for this job from the ground up, or is it a general-purpose model someone has adapted for CX?
The answer tells you whether you're looking at something built to hold up, or something that worked in the demo.
What built-for-the-job looks like under pressure
PolyAI's proprietary model, Raven, was built specifically for customer conversations, an LLM trained from the ground up to handle what contact centers actually deal with: not just FAQs but also authentication, making reservations/appointments, troubleshooting with irate customers. Agent Studio, the platform built around Raven, has co-evolved with it for years. They were designed for delivering a great customer experience, together. That co-evolution shows up in the infrastructure.
GitHub's infrastructure is straining under agentic demand. PolyAI's isn't. PolyAI maintains 99.999% availability. If a data center fails, the system routes around it automatically, and the agent stays online as a design default.
When a service outage drives a surge in call volume, when a weather event sends thousands of customers to their phones at once, when your brand is already under pressure, and the last thing you can afford is a failed interaction, the contact center either holds or it doesn't. A system built for that environment is designed around those moments, not tested against them after the fact. In a crisis, the AI answering your customers' calls is the last thing to go down.
The episode goes deeper, and so does the technology
Michael Chen and Nikola Mrkšić cover considerably more ground in this conversation, including what the next generation of agentic AI looks like, why most speech-to-speech AI still can't take action, and what the Agent Development Kit means for enterprises that want to build on top of PolyAI's platform. Watch the full episode here .
Reading about the difference between purpose-built and general-purpose AI is one thing. Building one yourself is the real proof. Build your first agent in 10 minutes with PolyAI’s Agent Studio right now .