Voice AI ROI Attribution: The Credit Stack Your CFO Will Sign

Every voice AI business case we have seen this year — from £150k pilots to £4M multi-year deployments — has been written by operations and read by finance. The two teams use different definitions of value. Operations writes the case in cost-per-call deltas and containment rates. Finance reads it as a question: which of these numbers can I actually credit to the P&L, and how do I verify them? When the answer is "you cannot, easily" — the deal stalls.

This is the attribution problem. It is the most underdiscussed reason that voice AI pilots stall short of enterprise rollout, and the single biggest gap in the public content on enterprise voice AI ROI. This post walks through how to fix it.

This guide is shipped by the team behind Dilr Voice — enterprise voice AI live in 40+ countries — and is grounded in the DATS five-stage methodology we use to place AI inside the enterprise P&L.

Why cost-savings business cases fail finance review

The default voice AI business case is a cost-savings story. Take call volume, multiply by AI cost-per-call, subtract from the human-handled cost-per-call, and present the gap as savings. This is the framing that drove early DialPad / Five9 economics and most of the case studies that we covered in the cost-per-call piece. It is also the framing most likely to lose at finance review.

There are three structural reasons it fails.

1. Cost savings without headcount change are accounting fiction. A CFO reading "we will save £840k by reducing average handle time across 200 agents by 30 seconds" knows what that line actually contains: 0.5 of an agent saved per agent-equivalent, distributed across 200 people. Without a workforce plan that translates the saved time into eliminated roles or absorbed growth, the number cannot be booked. The 2025 McKinsey State of AI survey found 71% of enterprises use generative AI weekly but only 6% capture material EBIT impact — the EBIT gap is largely an attribution gap, not a capability gap.

2. Containment-rate uplift is not the same as revenue saved. A pilot that lifts containment from 38% to 64% reads as a 26-point win. But the financial credit depends entirely on what happens to the 26% of callers who would have escalated and now do not. If they would have churned anyway, the credit is small. If they would have closed a higher-value renewal, the credit is large. Operations cannot answer that question. Finance has to.

3. The "rescued revenue" line is the most-claimed and least-believed. Every voice AI deck contains a line like "AI follow-up rescues £2.4M of stale leads per year". That line will be deleted in finance review unless it is attached to a counterfactual reach-rate that the company actually had before the deployment, measured in the same data warehouse.

These are not hypothetical problems. They are why a 2026 ServiceNow Enterprise AI Maturity Index reading shows only 15% of enterprises are Optimising or Leading on AI — the majority are stuck at Implementing or Scaling, which is where attribution discipline breaks down. If you are about to write a voice AI business case in this environment, read our voice AI TCO post first, and then come back here for the revenue half of the model.

71%

Enterprises using gen-AI weekly (McKinsey 2025)

Capturing material EBIT impact

15%

Optimising or Leading on AI (ServiceNow 2026)

2.5×

EBIT impact, AI leaders vs laggards (BCG 2025)

The four credit lines a CFO will actually book

A voice AI business case that survives finance review has four — and only four — credit lines. Each one has a verifiable counterfactual, a measurable post-deployment value, and a clear owner who can sign off on the delta. They are summarised below, and then walked through individually in the rest of this post.

Credit line	What gets booked	Counterfactual	Owner
Headcount avoided	FTE cost not incurred against a growth plan	Pre-existing workforce model showing planned hires	CFO + COO
SLA reach-rate uplift	Revenue from contacts you would not otherwise have made	Historical reach rate at the same lead-age cohort	Revenue ops
Recovered deal margin	Gross margin on rescued opportunities	Pre-AI conversion rate at the same campaign	CFO + CRO
DSO improvement	Working-capital release on collections	12-month rolling DSO baseline	Treasury / Finance

The four together form a credit stack. Add them, subtract the true TCO (platform fees + integration + change management + governance overhead), and the result is the EBIT impact the CFO can actually credit. Anything outside this stack — perceived NPS improvement, "agent productivity uplift", brand benefit — should be in the narrative but not in the numbers.

Each credit line has a specific way it gets measured and a specific way it goes wrong. The next four sections walk through them.

Credit line 1: Headcount avoided

The cleanest credit line is the one the CFO can verify against a workforce plan that already exists. If the operations team was planning to hire 18 agents to handle Q3 volume growth, and voice AI handles 60% of the increment, that is 10.8 hires not made. Multiplied by fully-loaded cost (salary + on-cost + training + management overhead), it is the credit line every CFO can defend in a board meeting.

The reason this works is the counterfactual is documented. The workforce plan exists. The hiring plan exists. The vacancies were on the books. Voice AI did not save money in the abstract — it prevented documented spend.

The trap is claiming headcount avoidance where no growth plan existed. If you were not going to hire those agents anyway, the "avoidance" is fictional and finance will catch it. The fix is to run the AI placement diagnostic against a documented operations plan before the pilot starts, so the counterfactual is locked in.

For inbound deployments, this is the most credible line. For outbound deployments — where the human team often has not yet been hired because the volume cannot be staffed economically — the credit shifts to credit line 2.

Credit line 2: SLA reach-rate uplift

Outbound and follow-up workflows have a different attribution structure. The human team cannot reach every lead in the SLA window. Voice AI can. The credit is not "cost saved" — it is "revenue created from contacts that would not otherwise have been made". This is the line that finance teams under-credit because operations does not present it correctly.

The right way to present it: take the pre-deployment reach rate inside the SLA window (say, 38% of leads reached within 24 hours), and the post-deployment reach rate (say, 89%). The delta — 51 percentage points — is the contact uplift. Multiply by the average revenue per reached contact and the conversion rate on that contact, and you have the revenue created. This is the framework we use in the enterprise outbound sales post and the real estate lead qualification piece.

The verification step is the one most pilots skip. The pre-deployment reach rate must come from the same data warehouse that will be reporting the post-deployment number, against the same lead-age cohort, with the same definition of "reached". If those three conditions are not met, finance will reject the comparison as a like-for-unlike. This is the most common reason a credible-looking reach uplift gets struck from the credit stack at audit.

Credit line 3: Recovered deal margin

Where reach-rate uplift measures contacts, recovered deal margin measures conversion. The model is: of the contacts voice AI made that human agents would not have made (credit line 2), what proportion converted to revenue? Multiply by gross margin, not gross revenue. Finance teams credit gross margin lines, not revenue lines, because revenue without margin is a working-capital cost.

The trap here is double-counting. If credit line 2 has already booked the revenue value of the reached contact, credit line 3 cannot also book the revenue. The correct partition is: line 2 books expected revenue using the pre-deployment conversion rate; line 3 books the uplift in conversion rate caused by faster, more consistent first contact. The two should reconcile to total margin without overlap.

For collections workflows — see our fintech collections piece for the FCA framing — the credit shifts from margin to recovery rate. A 4-point lift on a 22% collections recovery rate on a £24M overdue book is £960k of recovered receivables. That number is credible because the book and the recovery rate both exist in the finance system already.

Credit line 4: DSO improvement

Days Sales Outstanding is the credit line that almost never appears in a voice AI business case, and almost always should. Faster, more consistent contact on overdue invoices compresses DSO by 2–6 days in the deployments we have seen. On a £80M revenue business with a 38-day DSO, a 4-day compression releases roughly £880k of working capital — a one-time release plus a permanent reduction in the working capital tied up in receivables.

This is the line treasury teams will credit immediately because it shows up in the cash conversion cycle directly. It is also the line that requires the deepest integration with the AR system — the AI needs to know which invoices are overdue, by how much, against which payment terms, and route them in priority order. The architecture for this is what we cover in the voice AI orchestration vs platform piece: the credit only materialises if the integration goes all the way to the receivables ledger, not just to the dialler.

How to structure the model finance will accept

The full attribution model is six steps. Each one closes a specific objection finance teams raise. The discipline is in the sequencing: you cannot skip step 2 to get to step 5, because step 5 borrows the counterfactual from step 2.

The six-step attribution model

01

Lock the counterfactual

Pre-deployment reach, containment, conversion, DSO — measured in the same warehouse, signed off by finance before the pilot starts.
02

Tag every credit line to an owner

Headcount avoided owned by COO. Revenue uplift owned by CRO. DSO owned by Treasury. No floating ownership.
03

Cap revenue lines at gross margin

Top-line revenue without margin is working capital, not P&L. Convert every revenue credit to gross margin before stacking.
04

Net out TCO honestly

Platform + integration + change management + governance + ongoing optimisation. The hidden line is governance; do not skip it.
05

Run sensitivity at ±25%

Show three scenarios — base, downside, upside. If downside is still cash-positive in year one, the case is durable.
06

Build the audit trail

Every credit line ties back to a data warehouse query the auditor can re-run. Without this, the credit gets struck at year-end review.

This is the structure we deploy in DATS engagements. Step 1 — locking the counterfactual — is the one teams skip and the one that costs them the credit at year-end. If the pre-deployment baseline was not captured before the pilot started, the auditor has no like-for-like to compare against, and the EBIT credit gets reclassified as "operational improvement, unverified". That is the difference between a board-approved expansion and a stalled renewal.

Where the model breaks: three failure modes

We have seen three failure modes wreck otherwise sound attribution models. They are not technical problems; they are model-design problems.

Failure mode 1: containment treated as savings. A containment-rate lift from 38% to 64% gets multiplied by a cost-per-contact figure and presented as savings. The trap is that the "saved" contact still exists — the AI handled it. The credit is the difference in cost-to-serve, not the elimination of the contact. The fix is to keep cost-per-contact in the model on both sides (AI-handled and human-handled) and book the delta, not the gross.

Failure mode 2: pilot economics extrapolated to full rollout. Pilots run on the easiest 20% of calls — the ones the team picked because they would demonstrate value cleanly. The remaining 80% include hard intents, regulated workflows, multi-system look-ups, and the long tail of edge cases. The marginal credit per call decays as you scale. The fix is to discount pilot economics by 30–40% when modelling full rollout, and tighten the estimate after the first quarter of production data. We cover this in detail in the pilot purgatory piece.

Failure mode 3: governance treated as a one-time cost. Implementation cost is a one-time line. Governance — the model risk, QA, compliance monitoring, escalation review, and ongoing optimisation — is a recurring line that scales with volume. Models that omit it look better in year one and worse in year three. The fix is to budget governance at 8–12% of platform spend on an ongoing basis and net it out in the credit stack.

For regulated industries — FCA-supervised financial services, HIPAA-regulated healthcare, EU AI Act Article 50 deployments — governance can run higher. The 8–12% range is a floor, not a ceiling.

What "good" looks like in 2026

A voice AI business case that survives finance review in 2026 looks roughly like this. The pre-deployment counterfactual is locked. The credit stack contains the four lines above with named owners. Revenue credits are capped at gross margin. TCO is net-out honestly with governance budgeted as ongoing. Sensitivity is run at ±25%, and downside is cash-positive in year one. The audit trail ties every credit line back to a warehouse query.

In our deployments, a case structured this way produces a credit stack in the £800k–£3.2M range for a mid-market deployment (£150k–£400k platform spend), and £4M–£12M for an enterprise deployment (£600k–£1.4M platform spend). Payback in the 6–11 month range is normal. Anything claiming sub-3-month payback should be pressure-tested — the integration alone usually takes longer.

The wider context: BCG's Widening AI Value Gap analysis found that AI leaders earn 2.5× more EBIT impact than laggards. The gap is not in the technology — most enterprises are running similar platforms. The gap is in attribution discipline. The 6% of enterprises capturing material EBIT impact are the ones that locked the counterfactual, defined the credit lines, and held finance and operations to the same set of numbers.

If you want to see how this is structured in practice — what the model looks like, what the pre-deployment data capture covers, and how the credit lines feed into the operating cadence — that is what the AI operating model engagement delivers in 8–12 weeks. For deployments already in production, the execution office covers the ongoing attribution discipline and audit trail.

Want to see this in production? Try Dilr Voice live (free, $20 credits), book an AI placement diagnostic, or read more about our approach to placing AI inside enterprise systems.

Service

AI Placement Diagnostic

Talk to the operators

Build the case your CFO will actually sign.

30-min scoping call · No deck · Confidential. We'll walk you through the four credit lines against your numbers and tell you whether the case is durable.

Book a call → See diagnostic →

Written by the Dilr.ai engineering team — practitioners who place voice AI inside enterprise P&L. Follow us on LinkedIn for shipping notes, or subscribe via the RSS feed.

Voice AI ROI Attribution: The Credit Stack Your CFO Will Sign

Why cost-savings business cases fail finance review

The four credit lines a CFO will actually book

Credit line 1: Headcount avoided

Credit line 2: SLA reach-rate uplift

Credit line 3: Recovered deal margin

Credit line 4: DSO improvement

How to structure the model finance will accept

Where the model breaks: three failure modes

What "good" looks like in 2026

Build the case your CFO will actually sign.

Place AI where the P&L moves

One email, once a month. No hype. Just what we learned shipping.

Why cost-savings business cases fail finance review

The four credit lines a CFO will actually book

Credit line 1: Headcount avoided

Credit line 2: SLA reach-rate uplift

Credit line 3: Recovered deal margin

Credit line 4: DSO improvement

How to structure the model finance will accept

Where the model breaks: three failure modes

What "good" looks like in 2026

Build the case your CFO will actually sign.

Place AI where the P&L moves

Related articles

Voice AI Human Approval Gates: A 2026 Design Guide

Voice AI SLOs and Error Budgets: The Enterprise Guide

Voice AI Load Testing: Proving the Ceiling Before Peak

One email, once a month. No hype. Just what we learned shipping.