A UK Big Six supplier handles roughly 30 to 45 million inbound customer contacts a year. The contact-centre P&L behind that volume runs at £6 to £12 per handled call. Most of those calls are not commercial conversations — they are meter reads, tariff change requests, balance enquiries, direct-debit changes, and smart-meter appointment shuffles. Repetitive, low-judgement, high-volume.
Then Ofgem changed the rules. The Consumer Standards that took effect through 2024 raised the bar on contact accessibility, on the duty to identify customers in vulnerable circumstances, and on the speed at which suppliers must offer support to those struggling to pay. Star ratings now sit on every supplier's homepage. Service quality is no longer a back-office metric — it is a regulator-graded, public commercial input.
This is the exact problem voice AI was built for: high-volume, low-judgement work that hides a small percentage of calls where getting it wrong is regulatory and reputational dynamite. The trick is not deploying AI for the 90%. It is making sure the 10% never reaches the bot in the first place.
This guide is shipped by the team behind Dilr Voice — enterprise voice AI live across regulated UK industries. For the methodology layer, see DATS, our five-stage AI placement system.
UK energy suppliers can take 60 to 75 per cent of inbound call volume off human queues with voice AI — but only if the vulnerability detection layer is the first thing built, not the last. Ofgem's Consumer Standards make "we'll patch that later" a regulatory exposure, not a roadmap item.
- Meter-read and balance enquiry calls are 80 per cent plus containable
- Vulnerability triggers must hand-off in under 8 seconds, every time
- The cost case clears at roughly 200,000 calls per month and above
The mathematics behind the business case is unromantic. A supplier with 10 million accounts handles call volume in the order of 250,000 to 400,000 calls per week. Even containing the bottom-quartile, lowest-judgement traffic — meter reads, smart-meter installation reschedules, balance and last-bill questions — yields seven-figure annual savings before anyone touches a tariff conversation. The unit economics work. The regulatory mathematics do not, until the vulnerability layer is in place.
Where the cost actually sits
Most suppliers underestimate how much of their inbound call mix is genuinely automatable. The standard internal estimate is "around 30 per cent" — usually based on a back-of-envelope split between transactional and advisory work. The real number, when call volume is properly tagged, is closer to 60 to 75 per cent.
The reason is that suppliers conflate call type with call complexity. A balance enquiry from a customer at risk of disconnection is a vulnerability conversation, not a balance enquiry. A tariff change from a customer who is on the Priority Services Register is a duty-of-care conversation, not a sale. Containment-rate models that don't separate these two layers either over-promise (everything is automatable) or under-deliver (nothing is, because the regulator might be watching).
The three call families that matter
Voice AI containment in utilities should be modelled across three call families, not against a single weighted average:
- Hard transactional — meter reads, balance, last-bill explanation, direct-debit changes. Containment ceiling: 85 to 95 per cent.
- Soft advisory — tariff comparisons, smart-meter scheduling, moving home journeys. Containment ceiling: 55 to 70 per cent.
- Regulated / vulnerable — payment difficulty, Priority Services Register, complaint escalation. Containment ceiling: 0 per cent. These calls must be detected and human-routed without ambiguity.
The economics live in family one. The regulatory survival lives in family three.
Why existing IVR doesn't get there
Most utility IVRs predate large-language-model voice agents by a decade. They are decision trees with hold music. They route by topic, not by signal. They detect "I want to read my meter" but not "I want to read my meter because I have £4 left on my prepay and a child at home." The traditional approach to fixing this is to bolt a chatbot on top — which fails for the same reason: it cannot read the second sentence.
The gating logic is everything. Vulnerability detection is not a fallback — it is the first conditional in the call flow. Get that wrong and the cost savings are paid back in penalties, complaint handling, and brand damage at multiples no business case survives.
The pattern repeats across regulated UK consumer industries — we mapped a near-identical containment-vs-duty curve for property managers in our piece on AI voice for property management tenant calls, where the vulnerability signal sits in housing condition rather than fuel poverty. The architectural lesson is the same: the gate comes before the bot, not after it.
Building the vulnerability detection layer
A proper detection layer in an energy voice AI deployment runs on three signals stacked, not one:
- Account signals — Priority Services Register flag, payment-plan status, prepayment meter, complaint history. Read from the CRM at call connect, before the agent speaks.
- Linguistic signals — explicit keywords ("disconnect", "can't afford", "child", "ill"), implicit affect (volume drops, hesitation, sobbing), and hedging patterns. These are model-detected, not regex-matched.
- Behavioural signals — call frequency in the last 14 days, prior IVR drop-offs, time of day. A third call this week from the same number at 23:40 is itself a vulnerability signal.
When any one of those crosses threshold, the call goes to a human agent within 8 seconds. Not "next available agent". A trained agent on the Priority queue. The bot does not attempt resolution. The bot does not even attempt warmth. It hands off, with a structured summary so the customer doesn't have to start again.
| Call family | Volume share | Human cost / call | AI cost / call | Containment | Notes |
|---|---|---|---|---|---|
| Meter reads | ~22% | £6.50 | £0.45 | 92% | CRM write-back required |
| Balance enquiries | ~18% | £8.10 | £0.55 | 78% | PSR check at entry |
| Tariff changes | ~14% | £11.20 | £0.85 | 65% | Cooling-off rules apply |
| Smart meter scheduling | ~9% | £9.40 | £0.60 | 80% | Calendar integration |
| Billing disputes | ~12% | £14.60 | £1.10 | 38% | Most go to human |
| Vulnerability / hardship | ~7% | £22.80 | n/a | 0% | Always human, prioritised |
| Complaint escalation | ~6% | £18.40 | n/a | 0% | Always human, regulated |
| Other / general enquiry | ~12% | £9.80 | £0.95 | 55% | Mixed routing |
The honest containment-weighted blended cost saving lands at 52 to 58 per cent — not the 90 per cent number some vendor decks suggest. We've written before about why those vendor numbers don't survive contact with regulated reality, in our cost-economics piece on AI voice cost per call. The savings are still substantial. They are just not magical.
What good deployment looks like in 2026
The maturity gap is wide. McKinsey's State of AI 2025 found roughly 88 per cent of enterprises now use AI but only 6 per cent capture material EBIT impact. In utilities, the gap is wider still — adoption is happening, but the deployments that survive Ofgem scrutiny are a small subset of those.
The deployments that work share four properties. They are sequenced as build phases, not feature requests.
Phase 1 — Vulnerability layer first. Before a single transactional flow goes live, the vulnerability detection layer is in production, instrumented, and tested. Not as a fallback. As the gate. This is non-negotiable. The Energy UK Vulnerability Commitment and Ofgem's parallel rules effectively mandate that every customer interaction surfaces the question of vulnerability before the conversation gets transactional. A voice agent that can't do this in the first eight seconds of the call cannot be deployed.
Practically, this means: PSR lookup at call connect, multi-signal linguistic detection in real time, an 8-second SLA to human handover when triggered, and full transcript-level audit trails for every flagged call. The audit layer is what survives a Citizens Advice complaint review or an Ofgem information request.
Phase 2 — Highest-volume, lowest-judgement traffic. Once the gate is live, the first transactional flow goes to the call type with the highest volume and the cleanest data path: meter reads. The CRM write-back is single-field, the failure mode is benign (read submitted but caller wants to also discuss something else → escalate), and the volume is large enough to prove the unit economics within 60 days of go-live. Tariff changes come next. Billing disputes never come — they are too messy for voice AI economics to clear.
Phase 3 — Outbound, sparingly. Outbound voice AI in utilities is regulatorily nervous territory. Smart-meter installation reminders, planned-outage notifications, and tariff-end notices are defensible — they have a clear consumer-benefit purpose and pre-existing consent. Anything that smells like a sales call is not. The legal architecture sits inside PECR and GDPR direct-marketing law, not just Ofgem's Consumer Standards. Suppliers who get this wrong don't lose money — they lose Ofcom's permission to dial.
Want to see how this is built in production? Try Dilr Voice live, book an AI placement diagnostic, or read about our approach to placing AI inside regulated UK enterprises.
The contrarian read on AI voice in utilities is this: the vendors winning the next two years will not be the ones with the cleanest demo. They will be the ones whose vulnerability gates have been independently audited, whose handover SLAs are contractually guaranteed, and whose transcript retention satisfies Ofgem's evidentiary standard. The technology piece is now commodity. The compliance piece is the moat.
For incumbent suppliers, the implication is procurement-shaped. The vendor evaluation criteria need to start with: "show us the vulnerability layer, in production, with audit logs, before we discuss containment rates." Anyone who reverses that order is selling a demo, not infrastructure.
Place AI voice where it survives Ofgem.
30-min scoping call · No deck · Confidential. We'll model your containment, design the vulnerability gate, and tell you whether the business case clears.
Written by the Dilr.ai engineering team — practitioners who ship enterprise voice AI in regulated UK industries. Follow us on LinkedIn for shipping notes, or subscribe via the RSS feed.