The voice AI procurement question changed shape in 2026. For five years the framing was binary — build it or buy it — and the answer for most enterprises was "buy", because building a production voice stack required a six-engineer team most operations leaders could not justify. That binary collapsed last year. The orchestrator layer — Vapi, Retell, Bland and a cluster of similar developer-grade platforms — matured into a genuine third path, and the build path itself got cheaper because foundation models, TTS engines, and telephony abstractions have all dropped in cost-per-minute.
The question for any board signing a voice AI commitment in 2026 is no longer "build or buy". It is: which of the three real paths fits the risk you can carry — and which path's hidden costs will you not see until month nine?
This guide is shipped by the team behind Dilr Voice — enterprise voice AI live in 40+ countries — and the operators behind DATS, our five-stage AI consulting methodology. Most of our enterprise engagements start with an AI placement diagnostic that runs this exact build / orchestrate / buy comparison against the buyer's actual call workload.
This post is the 2026 decision framework. Five dimensions, three paths, one industry calibration table, and the seven disqualifiers that automatically rule a path out before procurement starts.
The market context the 2025 framing missed
McKinsey State of AI, Nov 2025
McKinsey State of AI, Nov 2025
ServiceNow AI Maturity 2026
BCG AI Value Gap, 2025
The data tells two stories at the same time. AI adoption is saturating — 88% of enterprises now use AI in some form, and Stanford's 2026 AI Index puts adoption growth at near zero because almost everyone is already in. But only 6% of enterprises are converting that adoption into material EBIT, and only 15% have reached ServiceNow's "Optimizing" or "Leading" maturity tiers. The gap between buying voice AI and getting EBIT from voice AI is now the operating problem, and the build / orchestrate / buy decision is where that gap opens or closes.
The 2025 framing — build for control, buy for speed — does not survive contact with the 2026 vendor map. Vapi, Retell, and Bland have all shipped enterprise tier features in the past two quarters (SOC 2 Type II, on-premise deployment options, HIPAA addenda, EU residency). ElevenLabs added on-premise voice AI in April 2026. PolyAI extended into Canada. Managed enterprise platforms — DILR.AI included — now ship with sentiment, transcription, summary, and audit logging built into the base licence rather than as bolt-ons. The path-specific advantages and disadvantages have shifted enough that any procurement decision made on 2024 assumptions will price the wrong path.
The three paths defined
Before any decision matrix, every stakeholder in the procurement conversation needs the same definitions. We've watched too many enterprise boards approve "buy" while their engineering team was actually pursuing "orchestrate" — different timelines, different cost curves, different compliance carry.
Path 1 — Build
You assemble your own voice stack. You contract with a telephony provider (Twilio, Vonage, or equivalent), license a foundation model API, license a TTS engine (ElevenLabs, Cartesia, or in-house), and write the orchestration layer yourself. Your engineers own the conversation loop, the retry logic, the failure modes, the latency budget, the analytics, the compliance instrumentation, and the upgrade path when any underlying provider changes their API.
What you get: full architectural control, no per-call vendor margin, no platform lock-in, and the option to fork or replace any component without contract negotiation. What you carry: a permanent voice AI engineering team (we typically see 4–6 engineers minimum), the integration debt of every external API, the upgrade burden when foundation models shift, and the liability cap when something breaks at 3am on Christmas Eve.
Path 2 — Orchestrate
You contract with a developer-grade orchestrator — Vapi, Retell, Bland, Synthflow — and use their platform to assemble agents from the same underlying components (LLM + TTS + telephony) but with the integration layer and the conversation loop pre-built. You still write the prompt logic, the tool-calling configuration, the escalation rules, the integrations with your CRM. You don't write the WebSocket bridge to Twilio or the streaming protocol to the TTS engine.
What you get: 60–80% of build's flexibility at 10–25% of the timeline. What you carry: vendor-specific tooling expertise (orchestrator skills don't transfer well), per-minute platform margin on top of your underlying LLM and TTS costs, and the structural risk that the orchestrator changes their pricing model, their roadmap priorities, or their corporate ownership. The 2025-2026 funding cycle made the last risk concrete — we covered the buyer-side play hidden inside the 2026 voice AI funding map, and the vendor consolidation risk sitting underneath every orchestrator contract.
Path 3 — Buy
You contract with a managed enterprise voice AI platform — DILR.AI, PolyAI, Cresta, and a small cluster of similar specialists. The platform ships with the conversation loop, the orchestration, the analytics, the compliance instrumentation, the sentiment and transcription, the audit trail, and the integrations to enterprise systems already built. Your team configures the agent — voice persona, escalation thresholds, business logic, integration endpoints — but does not assemble it from components.
What you get: weeks-to-production rather than months, a single contract that carries the compliance and SLA obligations, embedded analytics that don't need separate procurement, and a vendor whose only product line is voice AI. What you carry: less flexibility on the conversational primitives, vendor lock-in proportional to your integration depth, and per-call pricing that includes the platform vendor's margin on every component beneath it.
The architectural distinction between these three is fundamental — and we covered the underlying orchestration vs platform enterprise choice in detail. This post sits a level above that: it brings build back into the comparison and forces a procurement-grade decision across all three.
The five-dimension decision matrix
The mistake most enterprises make is to compare these three paths on a single dimension — usually time-to-production or three-year cost. The dimensions interact. A path that wins on time but loses on compliance carry is the wrong path for a regulated buyer. A path that wins on cost but loses on engineering depth required is the wrong path for an operations team without a voice AI engineering function.
The five dimensions that matter:
| Dimension | Build | Orchestrate | Buy |
|---|---|---|---|
| Time-to-production | 9–14 months | 10–18 weeks | 4–8 weeks |
| Year-3 TCO @ 1M calls | £3.2–4.8m | £1.4–2.1m | £0.9–1.5m |
| Engineering depth required | 4–6 specialist engineers permanently | 1–2 voice AI engineers + DevOps | 0 voice AI engineers; 1 platform owner |
| Compliance burden carry | You are the controller and processor | Shared — orchestrator is processor, you carry deployer obligations | Vendor carries processor obligations; you carry deployer |
| Procurement risk | Concentration risk on underlying providers | Orchestrator viability + pricing risk | Vendor lock-in + integration depth |
We model TCO with the assumption that the underlying workload is 1 million calls per year averaging 3 minutes per call (3 million voice-minutes annually), with three integrations to enterprise systems, full GDPR + Article 50 + ICO Code coverage, and a 24/7 production SLA. The numbers shift for different workloads but the rank order doesn't — buy is consistently cheaper at enterprise scale once you load the full cost of the build path, including the engineers, the upgrade cycle, the operations burden, and the compliance instrumentation.
The Year-3 TCO column is the one most boards underestimate. The build path's headline cost — telephony + LLM + TTS per-minute — is genuinely lower than buy's per-call pricing. The full TCO is not. Our voice AI TCO model walks through the seven hidden cost lines vendors typically don't quote — engineering team loaded cost, on-call rota, foundation model upgrade cycles, telephony provider migrations, compliance audit preparation, integration maintenance, and analytics tooling. The first time we run this model with a build-leaning team, the gap is usually £1.5–2.5m larger than they expected at three-year horizon.
The seven disqualifiers
Before you weigh the matrix, run these seven questions. Each one automatically rules a path out — saving you a six-week procurement cycle that was always going to land in the wrong place.
Disqualifier 1 — Production deadline under 12 weeks? Build is out. You will not assemble a production-grade voice stack from scratch in under three months even with an experienced team. We've seen one exception in the last two years and it was a four-engineer team with prior voice infrastructure experience working on an unregulated use case. The probability of that profile in your business is near zero.
Disqualifier 2 — Zero voice AI engineering capability today? Build is out, orchestrate is on probation. The orchestrator path requires at least one engineer who can read API documentation, write prompt logic, configure tool-calling, and debug streaming protocol issues. If you don't have that engineer, buy is the only path that lands. Hiring one will take 12–16 weeks in the UK voice AI market — longer than the buy path's full deployment timeline.
Disqualifier 3 — Regulated deployment with EU residency requirement? Orchestrate gets tightly constrained. Many orchestrators run on US-anchored infrastructure with limited EU residency options as of mid-2026. Our EU data residency voice AI guide walks through the architecture that satisfies regulated buyers. Build and buy both clear this gate if you select for it; orchestrate forces you onto a smaller subset of platforms.
Disqualifier 4 — Call volume below 200,000 calls per year? Build is out on economics. The fixed cost of a build team does not amortise below roughly 200k calls annually for most use cases — the engineering rota alone is £600–900k loaded cost. Below that threshold the per-call cost on build paths exceeds the per-call cost on managed platforms.
Disqualifier 5 — No three-year commitment authority from the executive sponsor? Build is out. You cannot ship a voice stack in a year and abandon it the year after — the upfront engineering investment requires multi-year amortisation. The buy path is the only one that supports clean 12-month exit clauses without writing off a sunk engineering investment.
Disqualifier 6 — Hallucination-intolerant use case (financial advice, medical triage, legal counsel)? Orchestrate gets harder. Orchestrator platforms inherit the underlying LLM's failure modes and rarely ship with the deterministic guard layers regulated industries require. We covered this in detail in our voice AI hallucination procurement gate post — buy paths typically ship with the containment instrumentation built in; build paths require you to construct it; orchestrate paths force you to combine multiple platforms.
Disqualifier 7 — Multi-site, multi-locale, multi-language enterprise deployment? Build becomes expensive. The marginal cost of adding a new language, dialect, or regional routing logic on a build path is significant — typically 8–12 engineering weeks per language. Buy paths bundle multilingual support; the multilingual voice AI for enterprise post walks through what scales and what breaks.
Run these seven questions against your specific deployment scope before you do anything else. Most enterprises eliminate at least one path on the disqualifier pass — and the procurement conversation gets tractable immediately.
The 3-year TCO model
The TCO numbers in the matrix table compress a real model. Below is the same model unpacked for the same reference workload — 1 million calls per year at 3-minute average, 3 enterprise integrations, full UK regulated coverage, 24/7 production SLA. Numbers are indicative ranges based on DILR.AI engagement data across 2025-2026; your numbers will shift based on call complexity, integration depth, and the specific compliance regime you operate under.
Build path — Year-3 TCO £3.2–4.8m
The headline per-minute cost is the most visible component but accounts for less than half the total.
- Voice AI engineering team (4–6 specialists, loaded cost): £1.6–2.4m over 3 years
- Foundation model + TTS + telephony per-minute (3M voice-minutes/year): £540–810k
- Integration build and maintenance (3 integrations × initial + ongoing): £280–420k
- Compliance instrumentation (audit trail, consent ledger, Article 50 disclosure layer): £140–210k
- Analytics infrastructure (sentiment, summarisation, QA scoring stack): £120–180k
- Production operations (24/7 on-call rota, incident response, telephony failover): £180–270k
- Upgrade and migration cycles (foundation model swaps, telephony provider changes): £140–210k
- Tooling, monitoring, compliance audit prep: £100–150k
The build path's strength shows up only at very high volume — 5M+ calls annually with a clear use case that doesn't shift faster than the engineering team can adapt. Below that, the engineering team's fixed cost dominates everything else.
Orchestrate path — Year-3 TCO £1.4–2.1m
The orchestrator carries the conversation loop but not the integrations, the prompt logic, or the compliance instrumentation.
- Orchestrator platform fees (per-minute, includes underlying LLM + TTS + telephony): £680–950k
- Voice AI engineering team (1–2 engineers + DevOps share): £420–630k
- Integration build and maintenance: £180–280k
- Compliance instrumentation overlay (orchestrators ship partial; deployer carries the rest): £80–120k
- Analytics gap-fill (orchestrator analytics are usually shallower than enterprise needs): £60–100k
- Production operations (lighter than build, heavier than buy): £80–140k
- Tooling, monitoring, vendor management: £40–80k
The orchestrate path's strength is the middle ground — faster than build, more flexible than buy. Its weakness is that it puts you in the worst-case position if the orchestrator changes their pricing model or gets acquired. We modelled three actual orchestrator transitions in 2025 — each one cost the affected enterprises 4–7 weeks of integration rework and a 25–40% pricing step-up.
Buy path — Year-3 TCO £0.9–1.5m
The buy path bundles the components and amortises the platform vendor's R&D across all customers.
- Platform per-call pricing (3M voice-minutes/year): £720–1.1m
- Platform owner (0.5 FTE for configuration, optimisation, integration management): £75–120k
- Integration touch-up (vendor ships pre-built integrations to common enterprise systems): £40–80k
- Compliance configuration (vendor ships Article 50 disclosure, audit trail, consent ledger; deployer configures): £20–40k
- Analytics already bundled — no separate spend
- Production operations carried by vendor SLA — no in-house rota required
- Tooling and monitoring carried by vendor — no separate spend
The buy path's strength is also its weakness. The per-call line is the largest single cost — but every other line collapses, which is why the total runs roughly 40% below orchestrate and 65% below build at the same workload.
The TCO comparison is not the whole picture. The hidden line item — the one most boards miss — is the change management gap and the operational pilot purgatory risk that compounds when engineering capacity is fully consumed by the voice AI stack and unavailable for downstream use cases. Build paths usually cost more in opportunity cost than they cost in cash.
Compliance burden carry — the dimension procurement teams underestimate
Compliance carry is the dimension that does not show up in cost models but moves between paths in ways that change the buyer's regulatory exposure. The UK and EU regulatory regime in 2026 distinguishes between the controller and the processor under GDPR, and between the deployer and the provider under the EU AI Act. The path you choose determines which side of each line you sit on.
On build: You are the controller and the processor under GDPR. You are the deployer under the EU AI Act and — because you assembled the system — you may also fall into the provider's obligations for the elements you customised. The full Article 50 disclosure obligation is yours; the ICO AI Code of Practice tool-inventory obligation is yours; the biometric data security architecture is yours to design and audit.
On orchestrate: You are the controller; the orchestrator is the processor for the call audio they handle. You are the deployer for Article 50 purposes — the orchestrator is the provider for the technical system you assembled on top of theirs. The deployer obligations under Article 50 enforcement from August 2026 are still yours; the provider obligations sit with the orchestrator.
On buy: You are the controller; the platform vendor is the processor and may also be the provider for the AI system. The deployer obligations remain with you — Article 50 disclosure, ICO Code transparency, auditability and explainability — but the technical instrumentation that satisfies those obligations ships with the platform.
The practical consequence: enterprises in financial services, healthcare, insurance, and public sector typically choose buy or build (not orchestrate) for production deployments because the compliance attribution is cleaner. Orchestrate works well for less-regulated workloads where the deployer/provider split doesn't trigger the heavier audit requirements. Our voice AI architecture for regulated industries guide walks through which regulatory regime forces which path.
The 4-question architecture test
- 1. Can we ship to production within our regulatory deadline? Path 1 vs 2/3 split
- 2. Do we have the engineering function to operate this for 5+ years? Path 1/2 vs 3 split
- 3. Does our regulator demand controller-processor clarity? Path 1/3 vs 2 split
- 4. Will our call volume exceed 5M annually within 3 years? Re-opens Path 1
Each question collapses one path or holds it open. Walk all four with your executive sponsor before procurement begins. The answer is rarely ambiguous once the four lines have been drawn — most enterprises arrive at the same single path within an hour.
Industry calibration — what actually gets chosen
Across roughly 200 enterprise engagements in the past 18 months — DILR.AI direct and partner-led — the path distribution is not uniform. Different industries have different default answers because their constraints stack differently.
| Industry | Most common path | Why |
|---|---|---|
| Financial services (banks, lenders, insurers) | Buy | Controller-processor clarity, FCA Code coverage, embedded auditability requirements |
| Healthcare (NHS Trusts, private providers) | Buy or Build | HIPAA/UK GDPR + DSPT carry; build appears at very high volume (NHS scribing at 20,000 clinicians) |
| Insurance (claims intake, broker support) | Buy | FNOL workflows are structured and high-volume; compliance carry concentrated; integration to claims systems pre-built |
| Public sector (councils, local government) | Buy | s114 cost pressure, PSED gates, no in-house engineering rota for voice |
| Outbound sales (SDR augmentation, lead follow-up) | Orchestrate | Speed-to-market priority, less regulated workload, smaller compliance carry |
| Consumer hospitality (hotels, restaurants, services) | Orchestrate or Buy | Volume-sensitive; orchestrate wins below 200k calls/year, buy wins above |
The pattern is consistent. Heavier compliance carry pushes enterprises toward buy. Lighter compliance carry and faster time-to-market priorities push them toward orchestrate. Build is rare and almost always reserved for very high volume (5M+ calls annually), specialist clinical or financial use cases, or organisations with an existing voice AI engineering function. For most UK enterprise buyers in 2026, the realistic choice is between buy and orchestrate — not buy and build.
The hybrid pattern — "buy core + build edge"
The single most-overlooked option in the procurement conversation: buy the platform for the high-volume, repeatable, compliance-heavy workload; build (or orchestrate) the edge case that doesn't fit. The buy platform carries the 80% of call traffic where standard inbound, outbound, escalation, and audit logic apply. The edge — a specialist workflow, a novel use case, an experiment — runs on a thinner stack and graduates to the platform only when it has earned its volume.
This is how most mature deployments end up structured by year two. The buy platform owns the production load; the engineering team runs experiments and novel agents on a lighter orchestrate-or-build stack and migrates them onto the platform when scale demands it. This pattern preserves optionality without paying the full build path's fixed cost. It also gives the engineering team a credible voice AI capability without forcing the whole organisation to depend on it for production.
The hybrid pattern fails in two situations. First, when the buy platform refuses to integrate with externally-built agents — the contract has to explicitly authorise this. Second, when the executive sponsor treats the edge experiments as production-bound from day one, which collapses them back into the full build path. Both failure modes are addressable contractually and operationally — but only if you design the hybrid into the procurement, not after it.
We covered the governance framework that supports hybrid deployments in detail, and the underlying operating model question walks through the in-house / vendor / hybrid split at the operating level. This post sits at the procurement level above both.
Procurement-grade contract clauses — regardless of path
Whichever path you choose, the same eight contract clauses determine whether the deal is procurement-grade or procurement-naive. Each one closes a specific procurement risk that the build / orchestrate / buy framing alone does not address.
-
Exit and portability. Right to extract call recordings, transcripts, agent configuration, conversation logs, and analytics data in machine-readable format on contract termination. Without this clause, your vendor lock-in is permanent regardless of path.
-
IP and training-data exclusion. Your call data is not used to train the vendor's foundation models. Explicit prohibition on derivative-model training from your traffic. This is non-negotiable in regulated sectors and our biometric data security post covers why.
-
Latency SLA with service credits. Median latency under 800ms (P50), P95 under 1,500ms, with auto-credit triggers above. Reference the voice agent latency benchmarks data for what enterprise-grade looks like in production conditions.
-
Hallucination liability cap and indemnity. The vendor indemnifies against regulatory exposure caused by AI-generated false statements within their scope of control. The cap should be sized against your worst-case regulator exposure — not against the contract value.
-
Article 50 disclosure architecture. Vendor commits to ship the disclosure scaffolding required for EU AI Act Article 50 compliance, including caller-side AI disclosure at the start of every interaction.
-
Audit rights and explainability artefacts. You can inspect, on request, the decision logic, prompt versions, tool-call traces, and escalation triggers for any call within the retention window. This is the contractual instantiation of the auditability procurement gate.
-
Regulatory change indemnity. When the regulatory regime changes (Article 50 enforcement, ICO Code of Practice updates, FCA Code extensions), the vendor commits to update the platform within a fixed window — typically 90 days — without separate change fees.
-
Vendor financial health attestation. Quarterly attestation of cash runway, funding status, and material corporate events. The 2026 vendor funding cycle made the vendor-viability question concrete; this clause makes it ongoing.
The fuller treatment of MSA clauses lives in our voice AI MSA contract clauses post — 11 clauses that enterprise legal teams should require regardless of path. The eight above are the ones most procurement teams overlook in the build / orchestrate / buy framing because they sit outside the architectural decision and inside the commercial decision.
The 30/60/90 procurement plan that tests all three paths
The mistake most enterprises make is to commit to a path before testing the others. The procurement plan below tests all three in parallel, then narrows.
Days 0–30: Run the disqualifier pass and the matrix. Walk the seven disqualifiers with your executive sponsor. Build the dimensional matrix against your specific workload — call volume, integrations required, regulatory regime, time-to-production constraint, engineering function size. Most enterprises eliminate at least one path on the disqualifier pass and another on the matrix. End the first 30 days with two paths remaining, not three.
Days 30–60: Pilot both surviving paths against the same call type. Choose one workflow — a single call type with clear success criteria — and pilot it on both paths simultaneously. This is the test enterprises skip because it feels expensive (running two pilots feels like duplication) and it is exactly the test that surfaces the truth. The build vs orchestrate pilots reveal real engineering effort. The orchestrate vs buy pilots reveal real compliance carry. Real deployment beats vendor pitch in every dimension. Our voice AI program design pilot-to-scale post walks through how to architect the pilot so it transfers to scale.
Days 60–90: Commercial-model the surviving path against 3-year scale. Take the pilot winner and run the full 3-year TCO against your real workload (not the reference workload in this post). Layer the eight contract clauses above into the commercial conversation. Negotiate. Sign or walk. The 90-day cadence is achievable for any path; the cadence collapses if you commit on day 30 to a path the pilot would have ruled out.
A note on what we recommend at DILR.AI
We are not pretending to be neutral — DILR.AI is a managed enterprise voice AI platform, which means we sit on the "buy" side of this framework. For most UK enterprise buyers in 2026, our honest answer is: start with buy, hybrid where the edge case demands it, and reserve build for the very narrow set of organisations where the volume, the engineering function, and the strategic case all line up. That answer is not because we sell the buy path — it is the answer we'd give a friend asking off-the-record.
What we'd not recommend is the path most procurement processes default to: a vendor selection that does not test the alternatives, a build case that doesn't price the engineering team, or an orchestrate decision made without a clear view of the underlying foundation model concentration risk. The decision matters in 2026 more than it did in 2024 — because the cost of getting it wrong has compounded through 18 months of vendor change and regulatory shift.
The build / orchestrate / buy decision is the largest single architectural commitment your voice AI programme will make. Run the disqualifier pass. Build the dimensional matrix. Pilot two paths in parallel. Commercial-model the survivor. Sign with the eight contract clauses in place.
Where this fits in the broader procurement architecture
This decision sits inside a six-stage procurement chain. The placement diagnostic identifies where to deploy voice AI in the call estate; this post determines how to source it; the vendor checklist and procurement framework shortlist vendors within the chosen path; the MSA contract clauses translate the procurement decision into commercial terms; the pilot-to-scale program design operationalises the deployment; and the day-2 governance framework keeps the programme defensible after go-live. Each stage compounds the value of the prior one. Skipping the build/orchestrate/buy decision — or making it implicitly by accepting the first vendor's path — is the single most expensive architectural shortcut in enterprise voice AI procurement.
Want to see this in production? Try Dilr Voice live (free, $20 credits), book an AI placement diagnostic that runs this build/orchestrate/buy comparison against your workload, or read about our approach to placing AI inside enterprise systems.
Pick the path your enterprise can carry — not the one the vendor sold last.
30-min scoping call · No deck · Confidential. We'll run the disqualifier pass against your workload and tell you which two paths to pilot — and which one to skip.
Written by the Dilr.ai engineering team — practitioners who ship enterprise AI in production. Follow us on LinkedIn for shipping notes, or subscribe via the RSS feed.