Voice AI TCO: the hidden enterprise costs vendors hide

Every voice AI vendor publishes a per-minute price. Almost none publish what that minute actually costs once you have run it through an enterprise procurement review. Vapi advertises $0.05 per minute. Bland advertises $0.11. Retell quotes $0.07. Each of those numbers is a real number — and each of them is wrong as a planning input for an enterprise voice programme.

The reason is structural. Orchestrator-style platforms charge for the thin layer that routes audio between speech-to-text, an LLM, text-to-speech, and a telephony provider. Everything else is pass-through. Pass-through is fine when you are testing a prototype. It is a procurement disaster when you are funding a 200,000-call-a-month outbound programme and the finance director asks for a fixed cost per outcome.

This post explains the gap between sticker and reality, why it widens at enterprise scale, and how to model voice AI total cost of ownership before you sign anything. If you are still benchmarking the per-minute economics, our voice AI cost per call analysis covers the call-level view that this TCO model sits on top of.

Key takeaway

Orchestrator pricing reveals roughly 20% of true voice AI cost. The other 80% sits in LLM tokens, telephony egress, observability, compliance add-ons, and on-call engineering — and it scales with call volume, not with the licence line.

The bar chart looks dramatic but the underlying math is dull. Every minute of audio on an orchestrator triggers four billable services (speech recognition, language model, text-to-speech, carrier minutes), then a fifth line if you turned on observability, then a sixth if compliance forced a region or a model upgrade, then a seventh if you put a human on call to debug it at 2am. None of those line items are in the sticker price. All of them are in your invoice.

What enterprise vendors do not put on the price sheet

The enterprise mistake is not believing the $0.05 number. It is believing it is the operative number. Procurement teams who use sticker-only models routinely under-budget voice AI programmes by 4x to 6x. That is not a vendor failure — it is a category misunderstanding. Orchestrators sell infrastructure. Platforms sell outcomes. The two carry fundamentally different cost shapes.

The five line items hidden behind orchestrator pricing

When CloudTalk and Retell published their 2026 pricing breakdowns, the gap between sticker and true cost on Vapi-style stacks landed in a consistent range: $0.13 per minute for a budget configuration, $0.31 per minute once you swap in GPT-4-class reasoning, ElevenLabs-grade voices and reliable telephony. A 4-minute call that looked like $0.20 of orchestration fees sits at $0.40 to $0.80 in production, depending on the LLM you picked. None of that is a markup. It is the cost of the components that the orchestrator does not own.

Cost layer	Vapi-style orchestrator	Enterprise platform	Where it surfaces
Per-minute orchestration	$0.05–$0.11	bundled	Vendor invoice
LLM tokens (input + output)	$0.02–$0.20/min, pass-through	bundled	LLM provider invoice
Speech-to-text + text-to-speech	$0.05/min, pass-through	bundled	STT/TTS invoices
Telephony egress + numbers	$0.01–$0.04/min	bundled	Carrier invoices
HIPAA / regional / compliance add-on	$1,000+/month flat	included	Vendor invoice
Observability + on-call eng.	0.5–1.5 FTE	included	Internal headcount

Bland's December 2025 shift to plan-based pricing made this even more visible. A 500-minute month on the Build plan costs $359 — the $299 plan fee plus $60 in call minutes. That is $0.72 per minute fully loaded. Buyers comparing it to the $0.11 sticker rate are off by 6.5x before they have factored in their LLM bill.

Engineering as a recurring cost

The line that most procurement teams miss entirely is engineering. Orchestrators ship raw infrastructure. Someone has to wire it to your CRM, build the prompts, manage prompt drift, run barge-in tuning, debug failed escalations, monitor latency, and respond when an LLM provider has a 4am incident. In our enterprise client base, that is a 0.5 to 1.5 FTE on-call commitment for a single voice programme — £45,000 to £140,000 annual fully loaded in the UK market, before you have written a line of business logic. Vapi processes 62 million calls a month as a category — every one of those minutes sits inside someone's engineering rota.

6.2x

Sticker-to-TCO multiplier on orchestrators

$1k/mo

Vapi HIPAA add-on, fixed regardless of volume

20–50%

Integration as share of build cost (industry range)

15–25%

Annual maintenance as % of original build

The numbers above are not edge cases. They are the rule for orchestrator architectures, drawn from enterprise voice AI cost analyses across 2026. The reason DILR.AI exists as a category, rather than as a Vapi reseller, is that none of these costs scale gracefully past about 100,000 calls per month — the point at which most enterprise programmes start mattering to the P&L.

See it in action

If you want a fixed per-minute number that already includes LLM, telephony, observability, and compliance, that is what the Dilr Voice platform delivers — explored on our inbound solutions page or live in the Dilr Voice platform.

Modelling true voice AI total cost of ownership

A defensible TCO model for an enterprise voice AI programme has four layers, not one. Sticker price is layer one. The other three are where every procurement decision actually breaks. The goal of the model is not to pick the cheapest vendor — it is to pick the vendor whose cost shape matches your call volume, compliance posture, and engineering bench.

Why platform TCO converges where orchestrator TCO drifts

The asymmetry that procurement teams miss is what happens at scale. On an orchestrator, every layer in that diagram scales linearly with call volume — more minutes, more tokens, more carrier fees, more engineering pressure. On a fully integrated platform, the same layers are bundled at one negotiated rate, and the engineering FTE is the vendor's problem rather than yours. The crossover point is usually around 50,000–80,000 minutes per month: below that, orchestrator economics win; above it, platform economics win decisively.

The contrarian read here — and one most analyst content avoids — is that the build-versus-buy decision for voice AI is not really about engineering capability. UK enterprise teams that pick Vapi or Bland often have the engineering bench to operate them. They lose anyway, because an on-call rota for a voice agent stack is the kind of cost that compounds quietly and shows up in the year-three review, not the year-one business case. Our enterprise voice AI buyer's guide covers the procurement criteria that surface this trap before it hits.

The other reason platform TCO converges is contractual. Orchestrator pricing is per-component and changes when any underlying provider raises prices — Bland's $0.09 to $0.11 jump in 2025 is a representative example. Platform pricing is contractual and predictable for the term of your agreement, which matters when finance is modelling a 36-month investment. UK enterprise procurement increasingly requires that level of cost certainty before a voice AI programme can clear governance — and it is a recurring blocker in tenders we see in financial services, healthcare and B2B SaaS. The Dilr customer case studies walk through specific TCO comparisons in deployment, and show how predictable contracted pricing changes the year-three review.

The procurement question is therefore not "which platform has the lowest per-minute rate" but "which platform's cost shape matches our volume curve". For a 5,000-minute monthly pilot, an orchestrator is fine. For a regulated enterprise programme handling 250,000 minutes a month with a multi-region compliance footprint, the orchestrator stack is a hidden tax on every call. The TCO model is how you make that distinction visible to the finance director before the contract is signed, not after.

For a deeper analyst view of how enterprise AI TCO frameworks are evolving, see Xenoss's enterprise AI TCO breakdown. For a vendor-by-vendor pricing comparison aligned with the numbers above, Retell's 2026 pricing analysis is the most current reference, even though it is published by a competitor. Our own enterprise voice AI services overview sets out where DILR.AI's bundled pricing model sits inside that landscape.

The DILR.AI position on TCO is uncomplicated. Per-minute pricing on our platform includes the seven layers above as a single contracted rate. The trade is that we are not the cheapest sticker — we are the most predictable invoice. For enterprise procurement that is a feature, not a compromise.

Next step

Get a fixed per-minute number you can put in front of finance

Stop reconciling six invoices to model voice AI cost. Dilr Voice bundles orchestration, LLM, telephony, observability and compliance into a single contracted rate so enterprise TCO is predictable for the full term of the programme.

Try Dilr Voice Book a strategy call

What enterprise vendors do not put on the price sheet

The five line items hidden behind orchestrator pricing

Engineering as a recurring cost

Modelling true voice AI total cost of ownership

Why platform TCO converges where orchestrator TCO drifts

Get a fixed per-minute number you can put in front of finance

Related articles

Omnichannel Voice AI: What the SoundHound Deal Means

Enterprise voice AI vendor evaluation: what buyers ask

Business case AI voice: the enterprise framework

One email, once a month. No hype. Just what we learned shipping.