The Voice-AI Category Quietly Consolidated — Four Patterns That Survived the Hype
Eighteen months ago everyone was building voice agents. Twelve months later half had pivoted. Now the dust has settled and four distinct patterns dominate UK voice-AI deployments. Here's what fits where, and how to pick without burning a £50K pilot.
Voice AI in 2024 looked like a category about to fragment into a hundred specialised tools. By mid-2025 most of those specialists had pivoted into adjacent categories or got acquired. What remained — and what UK buyers are actually deploying in 2026 — are four distinct architectural patterns, each suited to a specific job.
We've spent the last six weeks deep in voice-agent procurement conversations with UK contact centres, support teams, and a handful of regulated-industry buyers. This post is what those conversations crystallised into. If you're scoping a voice-AI project right now, this should help you find the pattern that fits before you commit to a vendor.
Pattern 1: Pipeline-platform-with-telephony
Vendors: Vapi, Retell, Bland AI
The shape: A platform that gives you a phone number, an IVR builder, dial-out capabilities, transfer rules, and CRM hooks. STT, LLM, and TTS are pluggable underneath but the platform handles the orchestration. You pay per-minute plus per-seat for the build environment.
When it wins: Inbound or outbound voice agents that need real telephony — dialling, transferring to humans, queueing, recording. Most contact-centre use cases live here. Your team handles conversation logic and integrations; the platform handles everything below.
The trap: The "platform" framing tempts buyers into thinking they're getting a complete solution. They're not — you still need to design conversation flows, build prompts, manage tool calling, and handle the inevitable edge cases. Estimate 2–3 weeks of build per agent for a serious deployment, not the 2–3 hours the marketing implies. See our Vapi use-case-fit framework for the full evaluation criteria.
Pattern 2: Direct-on-model
Vendors: OpenAI Realtime, Grok Voice, Google Live API
The shape: A single API that takes audio in and returns audio out. STT, LLM, and TTS are all internal to the model. You bring your own telephony (Twilio, Plivo) and your own conversation orchestration code.
When it wins: Custom in-product voice assistants where you have specific UX requirements that pipeline platforms can't accommodate. Also: situations where the model's specific capability matters — Grok for real-time freshness, OpenAI Realtime for the GPT-class reasoning, Google Live for multilingual coverage at scale.
The trap: You're now in the orchestration business. Telephony, retry logic, transfer flows, recording, transcript-to-CRM — all yours to build. Worth it when you have unusual requirements; expensive when you don't. The right team to pick this is one with backend engineering depth, not a vendor-evaluating ops manager.
Pattern 3: Voice-quality-led conversational
Vendors: ElevenLabs Conversational AI, Cartesia (recently launched a conversational tier), Hume
The shape: A platform optimised around voice prosody, emotional inflection, multilingual nuance. Often built on a top-tier TTS engine extended into a full conversation surface. Telephony usually optional or third-party.
When it wins: Brand-fronting voice — marketing surfaces, IVR personas, voice avatars in apps, voice characters in games or interactive content. Anywhere the voice itself is the product, not a delivery mechanism.
The trap: Cost-per-minute is meaningfully higher than pipeline platforms or direct-on-model. Easy to justify for a marketing site getting 500 sessions/day; hard to justify for a contact centre doing 10,000 calls/day. The cost arithmetic differs by 3-5× at typical volumes. See Speechmatics' position on the STT layer — voice-quality-led platforms still need solid STT and that's its own decision.
Pattern 4: Specialised vertical voice
Vendors: Hippocratic AI (healthcare), CallRail Convert AI (legal/professional), Sayata (insurance)
The shape: A platform purpose-built for a single regulated or process-heavy vertical. Includes domain-specific scripts, compliance controls, integrations with vertical CRMs (Epic, Clio, Applied), and operational workflows the generic platforms don't ship.
When it wins: Regulated industries where the build cost on a generic platform exceeds the licence cost on a vertical specialist. Also: businesses where the voice agent needs to coordinate with vertical-specific software the generic platforms don't integrate with.
The trap: Vertical specialists charge a premium and lock you in. The migration cost when you outgrow the platform — or when the platform underdelivers on a feature — is significantly higher than with a generic stack. Pick this pattern only when the regulated/integration requirements are real and a generic platform genuinely can't solve them.
How to pick the pattern
Here's the decision tree we'd hand a buyer evaluating voice AI in 2026:
1. Does your job require real telephony with transfers and queueing? If yes → Pattern 1 (pipeline-platform). If no → continue.
2. Is the voice itself a product or brand surface — does the prosody and emotional quality matter more than the conversation logic? If yes → Pattern 3 (voice-quality-led). If no → continue.
3. Is there a regulated vertical with required compliance integrations the generic platforms can't ship? If yes → Pattern 4 (specialised vertical). If no → continue.
4. Do you have backend engineering depth and bespoke UX requirements? If yes → Pattern 2 (direct-on-model). If no → revisit Pattern 1.
The most common mismatch we see is buyers picking Pattern 2 (direct-on-model) for a job that fits Pattern 1 (pipeline platform) — typically because the per-minute pricing on the model API looked cheaper. Six months in they've built half a pipeline platform themselves, badly, and the all-in cost is 2-3× what a Vapi or Retell deployment would have cost.
The vendors that didn't make the cut
A few once-prominent voice-AI tools that don't appear above:
- Synthflow — strong product but converging on the same pipeline-platform shape as Vapi without a clear differentiator. Worth tracking but not a default.
- Air.ai — broader marketing claims than the underlying platform supports. Procurement risk.
- Voiceflow — pivoted toward enterprise design + chatbot tooling rather than production voice deployments. Different category in 2026.
What this means for buyers
If you're scoping a voice-AI project: identify the pattern first, then evaluate vendors within it. Don't run a five-vendor RFP across patterns — they're not comparable, and the procurement exercise will surface confusion rather than clarity.
If you're a vendor reading this: the pattern boundaries are now visible to buyers. The category isn't going to support 50 generalist voice-AI tools. The strongest competitive position in 2026 is being clearly the best in one pattern, not being the cheapest across all four.
We'll be following up on this in June with a category-conversion analysis: which voice-AI tool category-pages on DigitalbyDefault.ai actually drove buyers to the vendor sites, and what that tells us about which patterns are converting buyer evaluation into action.
Enjoyed this article?
Subscribe to our Weekly AI Digest for more insights, trending tools, and expert picks delivered to your inbox.