The pricing model conversation most enterprises get wrong
When an enterprise buyer asks "how much per minute?" they have already lost the pricing conversation. Not because the rate is wrong, but because they are negotiating the wrong variable.
Voice AI vendors compete on at least four incompatible commercial models — per-minute, per-call, per-resolution, and platform subscription with usage top-up. Each one measures a different unit of value. Each one rewards the vendor for something different from what the buyer actually wants. And the enterprise that negotiates only on rate, without understanding the incentive architecture underneath, ends up with a contract that optimises for the vendor's economics, not theirs.
This post maps the four pricing models in detail — what each covers, what it hides, what it rewards, and how to negotiate the structure that aligns the vendor's incentives with your business outcome. The methodology is relevant whether you are evaluating Vapi, Retell, Bland, ElevenLabs, PolyAI, or a managed voice AI service such as Dilr Voice.
This guide is published by the team behind Dilr Voice — enterprise voice AI active in 40+ countries. For a broader AI programme review, see our AI placement diagnostic.
The four pricing models explained
Model 1: Per-minute
Per-minute pricing charges for every second the call is connected — from the moment the line answers to the moment it terminates. It is the oldest voice billing model in telecoms, inherited by voice AI vendors that built on top of telephony infrastructure.
What it includes (or does not). Most per-minute rates bundle STT (speech-to-text) transcription. LLM inference is usually excluded and billed separately per token. Telephony egress (the PSTN termination cost) may be bundled or itemised. Real-time analytics and post-call summaries are typically an add-on. So a published rate of £0.08/min almost never represents the all-in per-minute cost. Our guide to voice AI total cost of ownership maps every cost layer that per-minute billing hides.
Who uses it. Vapi, Bland AI, and Retell AI all lead with per-minute commercial models because it maps directly to their metered infrastructure costs — STT tokens, LLM calls, telephony minutes. It is the easiest model to understand at the top of a sales call and the hardest to benchmark at contract negotiation.
Where it works. Per-minute performs acceptably when call durations are predictable and stable across use cases, and when the call's purpose maps naturally to time — such as a timed reminder or a structured data capture of known script length.
Model 2: Per-call / per-interaction
Per-call pricing charges a fixed amount per initiated call, regardless of duration. It is more common in managed voice AI services and outbound dialler platforms where the buyer's economics are driven by reach (contacts made) rather than time per contact.
What it includes. Per-call pricing typically bundles a fixed call-duration ceiling — for example, up to five minutes per call. Duration beyond the ceiling is either billed per-minute on overage or handled by forced disconnection. This is an important architectural detail to confirm before signing: a forced disconnect mid-conversation is a CSAT event you will own.
Where it works. Per-call is a better model than per-minute when: (a) your use case has naturally short, consistent calls; (b) you run high-volume outbound campaigns where your metric is contacts reached, not time per contact; and (c) you can absorb the cost floor for non-productive connects (answering machines, sub-10-second disconnects).
The trap. Per-call pricing still does not care whether the call resolved anything. A 45-second call ending with "I will call you back" and a four-minute call that books an appointment both cost the same. The vendor has no economic incentive to help you make calls more effective or shorter.
Model 3: Per-resolution / per-outcome
Per-resolution pricing (also called outcome-based pricing) charges only when the agent successfully completes the defined objective. A call that connects but does not book the appointment, resolve the claim, or confirm the payment is either free or priced at a significantly reduced rate.
What it requires. This model requires a contractually precise definition of "resolution." That definition is the most important negotiation in the entire commercial conversation — and the one most buyers do not realise they are having.
Resolution can be defined as:
- Action completion — a booking was made, a payment confirmed, a claim registered (most favourable to the buyer; verifiable in a system of record)
- Intent confirmation — the caller expressed intent to proceed (ambiguous; requires downstream conversion verification)
- Containment — the call reached its conclusion without escalating to a human (favours the vendor on borderline cases)
- Hybrid tier — full resolution charges full price; partial completion charges a reduced rate; failed calls are free
Where it works. Per-resolution is the theoretically optimal model for any use case with a clearly auditable outcome — appointment booking, payment confirmation, claim registration. The vendor earns only when you earn.
Model 4: Platform subscription + usage top-up
The platform model charges a monthly or annual licence fee for access to the software layer — conversation design tools, analytics dashboards, integration connectors, governance and compliance features — with a usage component (per-minute or per-call) layered on top for actual call volume.
What it includes. Platform fees typically cover: workspace and seats, custom vocabulary training, analytics depth, and compliance tooling. The usage layer covers the telephony and AI inference costs that scale with volume. This is the model structurally closest to traditional SaaS commercial terms.
Where it works. Platform pricing benefits buyers who need the full software feature set from day one, have broadly predictable call volumes, and plan to operate the platform long-term with internal teams — not on a use-it-then-leave basis.
The trap. A platform model with no usage cap can behave worse than pure per-minute at scale. The platform fee creates a sunk-cost psychology that discourages buyers from challenging call efficiency — "we're paying the licence anyway" becomes the rationale for tolerating poor containment rates and inefficient call flows.
- Per-minute Longer calls → vendor earns more
- Per-call More connects → vendor earns more
- Per-resolution Completed outcomes → vendor earns
- Platform + usage Licence lock-in + volume growth
What each model actually incentivises
Understanding the headline pricing model is necessary but not sufficient. The real analysis is: what does this model reward the vendor for doing — and how does that diverge from what the buyer needs?
Per-minute: the long-call bias
A vendor running a per-minute model has a structural preference for longer calls. This does not mean they will intentionally engineer verbose agents — but it does mean that when product trade-offs arise (terse confirmations vs elaborate ones, quick escalations vs self-recovery scripts), the economics favour the longer path.
More practically: optimising containment rate on a per-minute model saves the buyer money. The vendor has no economic incentive to help you with that optimisation, because shorter calls reduce their revenue. If you are signing a per-minute deal, ensure the contract includes explicit containment-rate benchmarks and accuracy floor commitments — not just as feature promises, but as pricing renegotiation triggers if performance degrades. Our voice AI total cost of ownership guide maps exactly how per-minute billing compounds across the full programme cost stack.
Per-call: the volume bias
Per-call pricing rewards call volume over call quality. A vendor optimising for per-call revenue wants you to initiate more calls — including repeats to numbers that failed first time, re-contacts to callers already attempted, and dials into list segments with naturally low pick-up rates.
The metric you care about is resolutions per list worked. The metric the per-call model rewards is calls initiated per list worked. These are not the same thing, and the gap between them is where campaign ROI quietly erodes. A well-designed AI voice business case should separate cost-per-attempt from cost-per-resolution from the very first model. If a vendor's proposal does not distinguish between the two, that is your first negotiation data point.
Per-resolution: incentive alignment — and the definition trap
Per-resolution is the model with the most favourable incentive alignment in principle. The vendor earns only when you do. In practice, the risk lies entirely in the definition.
A vendor with a per-resolution commercial model has strong incentives to:
- Broaden the resolution definition during negotiation — "a call that reaches 90 seconds with a positive sentiment signal counts as resolved"
- Push containment as the proxy metric — "any call we handle end-to-end without escalation counts as a resolution"
- Exclude failure modes from the resolution gate — "machine-answer connects do not count, but 'I will think about it' does"
The safest resolution definition for the buyer is action completion with system-of-record confirmation — the booking hit the CRM, the payment is in the transaction log, the claim is registered in the platform. Any softer definition creates room for disputed invoicing and vendor-side optimisation away from your actual business outcome. The voice AI MSA contract clauses guide covers how to encode the resolution definition as a contractual commitment rather than a commercial understanding.
Platform subscription: the lock-in mechanics
Platform pricing creates a different category of misalignment — not at the per-call level, but at the contract renewal level. Once your team has built conversation flows on a platform's proprietary design tooling, integrated its analytics into your governance reporting, and embedded its compliance outputs into your audit documentation, switching becomes expensive regardless of pricing.
The vendor's incentive is therefore to make the platform stickier, not necessarily better. Watch for platform contracts where: conversation flows are stored in a proprietary format with no export API; analytics are accessible in the UI but not exportable in standard formats; and number porting is technically possible but requires a 90-day migration process. These are lock-in mechanics dressed as product design decisions.
Decision framework: matching model to use case
Per-resolution: the agentic AI pricing frontier
Per-resolution pricing is not new in service industries. Outcome-based fee structures exist in legal (contingency), recruitment (placement fees), and debt recovery (commission on collected amounts). What is new in 2026 is the arrival of agentic voice AI that can complete end-to-end transactions autonomously, making per-resolution pricing technically viable at scale for the first time.
The enabling capability is the tool-calling layer. An agent that can query your CRM, check appointment availability, create a booking record, and dispatch a confirmation SMS has a clearly auditable, system-verified outcome. The resolution event is not a matter of interpretation — it is a state change in your system of record. This is categorically different from earlier conversational voice AI, where the "outcome" of a call was information exchanged (the caller's preferred day) rather than a transaction completed.
As the agentic model matures, per-resolution pricing will become the dominant commercial structure for high-value enterprise voice AI deployments — precisely because it makes vendor economics dependent on platform reliability and outcome quality rather than call volume. Buyers evaluating vendors in 2026 should assess which providers have the technical architecture to support per-resolution billing: verifiable action completion, an immutable audit log, and a CRM confirmation webhook. Vendors who cannot provide all three are not equipped to honour a genuine per-resolution commercial commitment.
Verifying that technical readiness is part of any serious procurement process. The voice AI SLA design guide covers how to contractually bind the technical capability that makes per-resolution billing auditable.
The real negotiation: structure, not rate
Most enterprise procurement teams negotiate voice AI pricing by asking for a percentage discount on the headline rate. Vendors expect this and have built margin to absorb it. The buyer who genuinely wins the pricing negotiation is the one who changes the structure, not just the rate.
Six structural moves that shift economics significantly:
1. Resolution-definition negotiation
If you are accepting any form of outcome-linked pricing, define "resolution" in the contract with reference to a specific system event, not a caller state. "Booking created in [CRM system] with status: confirmed" is a resolution. "Caller expressed intent to proceed" is not. Tie the definition to your system of record — something you can independently verify without relying on the vendor's own reporting.
2. Price-lock duration with infrastructure decay trigger
Voice AI infrastructure costs are falling materially. LLM inference costs dropped significantly in 2025 and continue to decline as model efficiency improves. A 24-month per-minute deal at a locked rate means the vendor is locking in today's margins as their underlying costs decrease. Negotiate annual pricing reviews with a mechanism — pegged either to a published LLM token benchmark or to a vendor-specific infrastructure cost acknowledgement — that passes declining infrastructure costs through to the buyer.
3. Volume tier structure aligned to your call profile
Most per-minute and per-call pricing includes volume tiers defined in units of time (0–500 hours per month, 500–2,000 hours, and so on). These default tiers are designed around median customer profiles, not yours. Negotiate tiers based on your actual call distribution and seasonal profile. An annual pooled minutes structure — where unused minutes in quiet months roll into peak months — is particularly valuable for retail, insurance, and utilities buyers with strong seasonal patterns.
4. Failed-call rate floor
In any per-call model, negotiate a hard cap on the rate charged for non-productive connects: answering machine detections, sub-10-second disconnects, and calls aborted by compliance logic (DNC flags, outside permitted hours). A failed-call rate of 30–50% of the standard per-call rate is a reasonable starting benchmark. These are not resolutions; full per-call pricing for them inflates your effective cost-per-resolution significantly.
5. Usage audit rights and reconciliation window
Every voice AI contract should include the buyer's right to reconcile billed usage against call logs at individual call level. Metering discrepancies between vendor-counted and buyer-counted minutes are common — call setup and teardown timing, rounding logic, and minimum billing durations can all diverge. Negotiate monthly usage statement access in a machine-readable format and a 30-day dispute window. Without this, the vendor's billing system is the only source of truth. The CFO's 14 procurement questions includes usage reconciliation as one of the financial diligence items.
6. Exit terms and data portability pricing
The pricing conversation is also the moment to negotiate exit economics. What is the practical cost of extracting your call recordings, transcripts, CRM integration configuration, and conversation flow designs if you switch vendors in 18 months? A per-minute model with expensive or slow exit terms is not a per-minute model — it is a per-minute model plus a switching-cost liability that your CFO has not modelled.
Negotiate: a 30-day maximum for full data export (recordings, transcripts, and analytics in standard formats); automated export tooling included in the base contract; and number porting timelines fixed in writing. Exit terms that rely on "we will work with you" rather than contractual commitments are not exit terms.
Hybrid pricing structures: the emerging enterprise model
In practice, most enterprise voice AI contracts in 2026 use a hybrid of two models rather than a pure instance of any one. Common hybrid structures include:
Platform + outcome tier. A monthly platform fee covers the software layer, compliance tooling, and dedicated enterprise support. Call volume above a minimum floor is billed per-resolution. Below the floor, the platform fee covers it. This structure benefits buyers with predictable minimum volumes who want upside-aligned variable pricing above that threshold — the vendor earns more when the programme scales, which aligns with the buyer's growth rather than penalising it.
Per-minute with containment performance credit. Standard per-minute billing is augmented by a quarterly credit applied when containment rate exceeds a contractual floor. If the agent achieves 85%+ containment rate, the buyer receives a usage credit applied to the next quarter's invoice. Below 80%, the vendor accepts a discounted rate for the shortfall period. This creates a risk-sharing mechanism that approximates per-resolution alignment without requiring a full resolution-definition negotiation — useful where the resolution definition is hard to encode precisely.
Blended rate with hard spend cap. A blended effective rate (per-minute plus LLM inference plus telephony, averaged across all components) with a hard monthly spend cap. Above the cap, additional calls are queued, deferred to the following period, or triggered at a materially reduced marginal rate. This gives the buyer hard budget certainty without the complexity of a platform licence structure — and caps the risk of a demand spike producing an invoice surprise.
Sector-specific pricing considerations
The optimal pricing model depends on what "resolution" means in your sector and how predictable your call characteristics are.
Financial services (FCA-regulated). In collections, the resolution is a payment received or an arrangement formally agreed — both CRM-auditable events that support genuine per-resolution billing. In KYC voice verification, the resolution is a confirmed identity signal in your identity platform. FCA Consumer Duty creates a particularly strong case for per-resolution models in complaints-intake workflows, because it aligns vendor economics with genuine case completion rather than call containment. For the regulatory context, see our guide on FCA Code of Conduct for voice AI in financial services.
Healthcare and NHS. Appointment scheduling and recall management have a clear resolution event — the appointment created in the scheduling system. This supports per-resolution billing with an action-completion definition. NHS procurement frameworks may require pricing to be mapped to a Crown Commercial Service or equivalent framework rate; independent per-resolution models may need to be declared as a framework call-off equivalent or justified separately in the tender documentation.
Contact centre and BPO. BPO operations often carry existing average handle time SLAs with their end-clients. Per-minute billing that reduces handle time benefits both the BPO (lower AI cost per call) and the client (better throughput and lower seat cost). However, per-resolution billing is structurally more complex for BPOs, because the resolution is defined by the BPO's client rather than the BPO operator. Platform models with transparent, exportable usage reporting — allowing the BPO to pass costs through to clients accurately — tend to work better in this context.
Retail and ecommerce. Order-status and returns calls have a binary resolution: the enquiry was answered and the caller's action was taken (return initiated, order status confirmed, exchange booked). Per-resolution works well at steady volumes. During seasonal peaks — Black Friday through January returns — per-call may be safer if resolution rate is expected to fall under volume pressure, because a degraded resolution rate on a per-resolution model creates cost-per-outcome variance that is difficult to budget for.
Red flags in voice AI pricing proposals
When reviewing a pricing proposal, these patterns indicate the structure has been engineered to benefit the vendor more than the buyer:
"Resolution includes any call that does not escalate." Containment is a platform capability metric — it measures what the agent can handle without handing over to a human. It is not a business outcome. A call that is contained but unresolved creates a repeat contact. Reject containment as the sole resolution definition in any per-resolution commercial model.
Per-minute rates without a fully loaded stack breakdown. If the proposal shows a per-minute rate without a clear breakdown of what is and is not included (LLM inference, telephony, transcription, analytics, support), the actual cost will be meaningfully higher than the headline. Request a fully loaded cost-per-minute including all variable components, then model it against your average call duration and your expected resolution rate. Compare the result against the AI voice cost per call analysis benchmarks.
No volume commitment required from the buyer. A vendor offering per-minute pricing with no volume floor has either very low marginal infrastructure costs or is pricing in an expectation of churn. Either way, it is a signal that the per-minute rate may not be sustainable at scale — and you may face a repricing event at your first renewal cycle.
Exit terms defined in days "to be agreed." Any data export or number porting process without a contractual timeline is a switching-cost barrier. "We will work with you on exit" is not an exit term. Negotiate 30 days maximum for full data export, confirmed in writing.
Annual billing with monthly volume variability. If a vendor bills annually in advance but your call volume varies 40–60% between peak and off-peak months, you are prepaying for capacity you will not consume evenly. Negotiate monthly or quarterly billing at the same effective rate, or an annual pool with monthly drawdown flexibility.
A full weighted voice AI vendor scorecard including commercial terms scoring is available in our procurement series. The voice AI ROI attribution framework covers how to attribute cost-per-resolution across a multi-use-case programme.
Benchmarking effective cost-per-resolution
The ultimate metric that cuts across all four pricing models is effective cost-per-resolution (ECPR) — total programme cost divided by the number of calls that produced the buyer's defined outcome.
To calculate ECPR for your programme:
- Denominator: count only calls that hit your resolution definition — CRM event created, booking confirmed, payment received
- Numerator: total billed costs including all components — per-minute or per-call charges, LLM inference tokens, telephony egress, platform fees, integration costs, and support fees
- Benchmark: compare ECPR against the fully-loaded cost of a human agent completing the same task, including salary, overhead, attrition, management, and technology
Once you have ECPR modelled across pricing options — per-minute at your average handle time and containment rate, per-call at your expected connect-and-resolution rate, per-resolution at vendor-proposed rates — the pricing decision becomes a model comparison rather than a rate comparison.
In most enterprise deployments, the per-resolution model with a well-defined action-completion definition produces the lowest ECPR for use cases where the outcome is digitally verifiable. The exception is programmes with high natural containment and short, consistent call durations — where per-minute can match per-resolution economics without the definition negotiation overhead.
Want to apply this framework? Try Dilr Voice with free credits, book an AI placement diagnostic to map your use cases to the right commercial model, or read about our approach to placing voice AI inside enterprise programmes.
Place AI where the P&L moves.
30-min scoping call · No deck · Confidential. We will tell you whether Dilr Voice or a DATS operating model engagement fits — and where your pricing structure should start.
Written by the Dilr.ai engineering team — practitioners who ship enterprise AI in production. Follow us on LinkedIn for shipping notes, or subscribe via the RSS feed.