Guide

Why generalist LLMs fail at voice (and how we fixed it).

Get the guide

In a text chat, a short wait is thinking time. On a phone call, it's a hung-up call.

General-purpose models from OpenAI and Anthropic are built for everything: poetry, coding, and research. That breadth comes at a cost. The models are massive, the lag is noticeable, and when they don't know the answer, they guess.

We built Raven specifically for the hard constraints of live phone calls. Sub-300ms latency, no drop in accuracy, and now it outperforms GPT-5 and Claude Sonnet 4.6 across all four of our customer service benchmarks, from instruction following to date-time logic.

This is what purpose-built looks like

Raven 3.5 is faster, and it understands how people actually talk. By training exclusively on enterprise customer service conversations, we've removed the overhead that slows generalist models down.

Fused training: Raven 3.5 combines GRPO and DPO techniques, active reinforcement, and gold-standard examples, so the model learns from both what good looks like and what to avoid to stop bad habits before they start.
Native multilingual thinking: Raven doesn't translate on the fly. It reasons in the target language before it speaks, which is why it handles 23 languages without the hesitation or unnatural phrasing that trips up GPT and Claude.
Auto-reasoning: A hidden reasoning trace lets Raven work through complex requests, like calculating a date three weeks out or handling an edge case, in milliseconds. Simpler turns get a direct response.