Most enterprises signing a voice AI contract in 2026 spend the bulk of their evaluation time on the AI layer: the platform, the LLM, the analytics dashboards, the CRM integrations. The telephony layer — SIP trunking, PSTN termination, CLI presentation, the number estate — tends to follow the path of least resistance: whatever the AI platform vendor bundles.
That choice is low-friction at the time of signing. It becomes a constraint the moment you want to switch AI platforms, expand into new geographies, resolve a caller-ID reputation problem, or renegotiate pricing. Telephony infrastructure has different economics, different exit exposure, and categorically different evaluation criteria from AI platform selection. Treating them as a single purchase — which every bundled offering invites you to do — means you are accepting telephony terms that were negotiated for the vendor's operational convenience, not yours.
This guide is produced by the team behind Dilr Voice — enterprise voice AI deployed across 40+ countries. For help structuring the infrastructure layer of your programme, see our AI operating model consulting or the broader DATS deployment methodology.
This guide covers the evaluation framework for enterprise voice AI telephony: SIP trunking vs bundled telephony, CLI and number reputation management, E.164 porting and number portability, international coverage, failover architecture, and the exit clauses that determine whether you own your number estate when you leave.
Why telephony and AI platform selection are different decisions
The AI platform and the telephony layer serve different functions and carry different risk profiles.
The AI platform governs intelligence: the LLM, the conversation logic, the escalation design, the analytics, the integration layer. Switching AI platforms is primarily a software and configuration challenge. Prompts are portable. Call data can be exported. The pain of switching is real but recoverable in weeks to months.
The telephony layer governs connectivity: which physical network carries the call, what number the caller sees on their screen, how calls are routed to agents in a failure, and what happens to your number estate when you terminate the contract. These are not software problems. Phone numbers, once associated with a reputation (good or bad), carry that reputation for months after porting. PSTN interconnects take weeks to establish and test. Number porting in some jurisdictions takes 30 to 60 days per number.
Most enterprises that deploy enterprise AI voice agents discover the telephony distinction too late: after they have signed a 24-month platform contract that locks their number estate to the vendor's SIP infrastructure. By the time the platform relationship deteriorates — through pricing changes, an acquisition, or a capability gap — the number porting process alone creates a switching cost the vendor exploited without ever admitting it.
The correct framing: the telephony layer is a utility infrastructure decision with a long operational horizon. The AI platform decision sits on top of it. Buy them separately, evaluate them separately, and contract them with separate exit rights.
Bundled telephony vs bring-your-own trunk (BYOT)
Enterprise voice AI platforms broadly offer two telephony models.
Bundled telephony means the platform vendor provides the SIP trunking and PSTN interconnects as part of the platform service. Your calls originate and terminate on their infrastructure. The number estate lives in their namespace. CLI presentation is controlled by their settings. Pricing is bundled into per-minute or per-call rates that may or may not reflect underlying carrier costs.
Bring-your-own trunk (BYOT) means you procure SIP trunking independently — from Twilio, Vonage, BT, BICS, or a carrier of your choice — and connect it to the voice AI platform. Your numbers are in your own carrier account. Your costs are separated from platform costs. Your exit from the platform does not require migrating your number estate.
Most SMB-targeted platforms default to bundled telephony because it removes a purchasing step for smaller buyers. Enterprise buyers should evaluate whether that simplification is worth the long-term infrastructure lock-in.
The case for BYOT in enterprise deployments:
- Number portability. Numbers registered in your own carrier account are yours unconditionally. When you change AI platforms, you reconfigure the SIP endpoint. The numbers stay. Bundled-telephony numbers may require a porting window of 30–60 days per number, during which outbound campaigns or inbound routing must pause or be redesigned.
- Pricing transparency. Bundled per-minute rates combine PSTN termination costs, carrier margin, and platform margin. BYOT separates them. For high-volume programmes processing millions of minutes per month, the cost difference is material.
- Carrier SLA independence. If the AI platform has a performance incident, BYOT allows you to route calls to an alternative endpoint while the platform recovers. With bundled telephony, the carrier incident and the platform incident are the same incident.
- Regulatory compliance. In some jurisdictions (Germany, France, parts of APAC), local number regulations require that the entity holding the number have a local legal presence. A BYOT approach lets you register numbers directly with a carrier who satisfies these requirements; a bundled offering may or may not have equivalent coverage.
The case for bundled telephony:
- Deployment simplicity. For a first deployment, removing the SIP configuration step accelerates go-live by days to weeks.
- Single vendor support. When a call quality issue occurs, you have one support queue rather than diagnosing whether the fault is carrier-side or platform-side.
- Cost predictability. Bundled per-minute pricing is easier to budget than BYOT with separate carrier and platform invoices.
The enterprise decision is not binary. Many organisations use bundled telephony for their initial pilot and BYOT for production scale — or bundle telephony for inbound (where number ownership risk is lower) and BYOT for outbound (where CLI management is the operational priority). The voice AI orchestration vs platform decision influences this: orchestration-layer buyers typically use BYOT by necessity; managed-platform buyers often start bundled and migrate to BYOT as call volume grows.
CLI management and number reputation
Caller Line Identification (CLI) — the number displayed on the recipient's screen — is one of the most underestimated operational variables in enterprise outbound voice AI programmes.
Contact rates on outbound campaigns depend heavily on what the caller sees before answering. A recognised, trusted number sees 3–5× the answer rate of an unknown or flagged number. A number flagged as "Potential Spam" or "Scam Likely" by iOS, Android, or third-party reputation services may as well not dial at all — answer rates fall below 5% on flagged numbers in some markets.
Number reputation is not static. It degrades in real time based on call patterns: excessive outbound volume from a single number, unusually short call durations (associated with robocall behaviour), calls before 8 AM or after 9 PM, and a high ratio of calls to complaints or hang-ups. Once a number is flagged, the reputation recovery window is 6 to 12 months — not weeks.
Enterprise voice AI programmes generating high outbound volume need a CLI management strategy from day one. That strategy has five components.
Number pools. Spread outbound volume across a pool of numbers rather than concentrating calls on a single CLI. The pool size should match your target daily call volume such that no individual number makes more than 20–30 calls per day.
Call pattern hygiene. AI voice agents should honour calling hours conventions (8 AM to 9 PM in the recipient's time zone, stricter in some jurisdictions — see GDPR and PECR compliance for AI outbound calling in the UK). Short calls should trigger investigation rather than being dismissed — a high ratio of sub-10-second calls indicates either wrong numbers or recipient hang-up behaviour that the campaign is ignoring.
Reputation monitoring. Subscribe to number reputation monitoring services (First Orion, TNS, Hiya) to detect flag events before they destroy a campaign. Many enterprises have no visibility into number reputation until a campaign's answer rate collapses. Proactive monitoring allows number rotation before a flagged CLI contaminates the entire campaign.
DNC suppression integration. CLI management and Do Not Call compliance are operationally linked. A robust DNC architecture in your AI voice dialler prevents calls to numbers that have opted out — one of the fastest ways to generate complaints that accelerate number flagging.
Carrier registration. In the US, the FCC's STIR/SHAKEN framework allows carriers to attest calls as originating from a legitimate, registered caller. Enterprise voice AI programmes operating outbound in the US should ensure their telephony provider supports full attestation (A-level), and that their number registration is current. UK callers can register numbers with Ofcom-affiliated schemes. In both markets, registration does not guarantee answer rates — but absence of registration increasingly triggers "unverified" labelling that is itself a reputation signal.
E.164 porting and number estate management
Telephone number portability — the ability to move a number from one carrier to another while retaining the same digits — is a legal right in most developed markets. In practice, porting is slow, can only be initiated by the gaining carrier, and requires the losing carrier's cooperation. In the UK, porting typically takes 5–15 working days for a geographic number. In the US, porting is governed by the FCC and takes 1–5 days for consumer lines; enterprise SIP trunk porting can take 2–4 weeks. In Germany and France, porting windows of 30+ days are common.
For a voice AI programme with 50 inbound numbers across three markets, a platform switch that requires full number porting could create weeks of simultaneous call routing disruption across every market. If the incumbent vendor does not cooperate (the losing carrier has limited obligation to cooperate faster than the minimum regulatory window), that disruption cannot be compressed.
Inventory your number estate before selecting a telephony model. A programme with 5 numbers and low inbound volume can absorb a porting window. A programme with 200 numbers across 8 countries and 24/7 inbound cannot. The scale of your number estate should directly inform whether BYOT is a procurement requirement, not a preference.
Require number ownership confirmation in the MSA. A well-drafted enterprise voice AI contract — consistent with the 11 MSA clauses enterprise legal teams require for voice AI — should include explicit confirmation that numbers registered under the service are the customer's property, with a defined porting-cooperation obligation on the vendor. Without this clause, you are trusting the vendor to cooperate in a porting process that is against their commercial interest.
Plan for transition assistance. The most practical mitigation for a porting window is maintaining parallel SIP routes during transition: the new vendor route receives live traffic while porting progresses, and rollback is possible if porting fails. This requires BYOT or, at minimum, a vendor who will tolerate a co-termination window without charging for both routes simultaneously.
E.164 format compliance. All enterprise telephony should use fully qualified E.164 numbers (country code + number, without spaces or dashes). Inconsistent number formatting across CRM, dialler, and telephony systems is the most common cause of DNC suppression failures and CLI mismatch errors. Voice AI CRM integrations — detailed in our CRM telephony integration architecture guide — should enforce E.164 normalisation at every write-back point.
International coverage and PSTN termination quality
Enterprise voice AI programmes that operate across geographies face a PSTN termination quality problem that no AI platform controls: the call quality between your SIP endpoint and the receiving PSTN varies by geography, carrier, and time of day.
PSTN termination quality has three dimensions that matter for voice AI specifically.
Latency. The round-trip time between the AI platform and the PSTN endpoint affects perceived conversation naturalness. A voice AI agent already has inherent latency from LLM inference and TTS synthesis. Adding 200ms of PSTN latency on top creates a conversation that feels broken to the caller. Enterprise deployments targeting geographies more than 2,000 miles from their AI platform's compute region should specify maximum PSTN latency in their telephony SLA — 80ms one-way is a reasonable enterprise target for markets with local PSTN interconnects.
Audio quality. G.711 codec (the PSTN standard) degrades at high jitter. SIP trunks using G.729 compression save bandwidth but introduce artefacts that particularly affect AI speech recognition accuracy. Enterprise voice AI programmes should specify G.711 with jitter buffer targets in their SIP configuration, not rely on defaults.
Coverage depth. "Global coverage" in a carrier's marketing means the carrier can terminate a call to any international number. It does not mean the carrier has local PSTN interconnects in every market. A call to a UK mobile number from a UK local carrier terminates in milliseconds. The same call from a US carrier with a UK "Point of Presence" may route through three hops before hitting the UK PSTN. Enterprise deployments in specific high-priority geographies should verify the carrier's in-country PSTN interconnect, not just their global routing capability.
Failover and redundancy architecture
Voice AI telephony carries the same uptime expectations as human call centre telephony: 99.9% or better. The difference is that call centre telephony failure modes are well-understood and have decades of mitigation precedent. Voice AI telephony adds new failure points: the SIP trunk between the AI platform and the carrier, the carrier's PSTN interconnect, the AI platform's SIP media handling, and the underlying compute running the voice agent logic.
A production enterprise deployment needs explicit failover design at each layer.
SIP trunk failover. A minimum of two SIP trunk routes (primary and failover) to different carrier PoPs, with automatic failover triggered on the primary trunk's registration failure. The failover SIP route should be load-tested quarterly — inactive failover routes frequently have expired credentials or misconfigured codecs that are only discovered during an actual outage.
Carrier diversity. For programmes where telephony failure has material business impact (inbound emergency-adjacent calls, time-critical outbound, healthcare), use two carriers for PSTN termination, not one with two trunks. Carrier incidents — PSTN routing tables, BGP issues, interconnect faults — can affect all trunks with a single carrier simultaneously.
Graceful degradation to human routing. When the AI voice agent is unavailable, calls should fail over to human agent routing, not ring out or go to voicemail. The failover routing design should be in your voice AI incident response runbook — an operational gap that our voice AI incident response runbook framework covers in detail.
SLA on telephony separately from the AI platform. The platform's SLA covers the AI layer. The carrier's SLA covers the PSTN layer. Bundled telephony creates ambiguity: when a call quality incident occurs, who owns it? The resolution is to require explicit telephony availability SLAs — distinct from platform availability SLAs — in any bundled offering. The voice AI SLA design guide for enterprise contracts covers the specific service levels that protect enterprise buyers, including telephony availability, call quality degradation thresholds, and financial remedies for SLA breach.
Evaluation criteria: the telephony procurement checklist
Enterprise telephony evaluation for voice AI differs from traditional telephony procurement because the AI layer creates additional requirements that a standard carrier questionnaire does not capture.
- Number ownership model BYOT preferred / confirm portability rights in MSA
- SIP trunk redundancy ≥2 routes, different PoPs, quarterly failover test
- Codec specification G.711 ulaw/alaw, jitter buffer ≤40ms, no G.729 compression
- PSTN one-way latency target ≤80ms to in-country PSTN interconnect
- STIR/SHAKEN attestation (US outbound) Full A-level attestation required
- CLI management capabilities Number pool rotation, reputation monitoring, per-number analytics
- DTMF pass-through RFC 2833 DTMF required for IVR bypass and payment flows
- Call recording compliance Pause-resume recording support for PCI DSS payment flows
- Porting cooperation clause Written obligation to initiate porting within 5 business days of request
- Incident escalation SLA P1 telephony incident acknowledgement ≤30 minutes, resolution ≤4 hours
The DTMF requirement deserves specific attention. Voice AI platforms that handle payment flows — collecting card numbers, PIN verification, or authentication codes — must use DTMF tone capture to prevent spoken digits from entering the call recording. This is a PSTN-layer capability not present in all SIP configurations, and is typically not tested during a standard voice AI evaluation. Specify it explicitly in your SIP configuration and test it in UAT before go-live.
Voice AI vendor lock-in: how telephony becomes the mechanism
The broader risk of voice AI vendor consolidation manifests concretely in telephony. An AI platform that controls your telephony layer controls your switching cost. An acquisition or pricing change creates leverage precisely because porting your number estate is the slowest step in a migration.
The contract clauses that protect enterprise buyers in this scenario are specific.
Number portability warranty. The vendor warrants that numbers are registered in the customer's name (or transferable to a carrier of the customer's choice) and that the vendor will initiate porting cooperation within five business days of a customer request.
Exit data format. On contract termination, the vendor must provide a complete export of: all numbers in the programme, the SIP endpoint configuration for each number, all call data routing rules, and all phone number registrations in E.164 format — formatted for import into a standard SIP trunk provider.
Transition period telephony. The vendor must maintain telephony services for 30 days post-contract-end at existing rates, to allow porting completion without service interruption. The voice AI SLA design guide and the MSA contract clauses that enterprise legal teams require include provisions along these lines — but telephony-specific porting and exit clauses must be explicitly negotiated, not assumed to follow from general exit terms.
Total cost of ownership impact. The hidden cost of telephony lock-in does not appear in the per-minute rate. It appears in the cost of a forced renewal — when you would have switched but could not absorb the number porting disruption. This switching cost should be estimated and included in your voice AI total cost of ownership model at the time of initial procurement, not discovered when the renewal arrives.
What to ask your voice AI vendor before signing
Enterprise procurement teams should put five questions to any voice AI vendor offering bundled telephony before signing. These are not hostile questions — they are reasonable diligence that any credible vendor can answer clearly.
1. In whose name are the phone numbers registered? Provide the carrier account holder's legal name. If the vendor's name appears on the carrier account, numbers are the vendor's asset, not yours.
2. What is the process and timeline for porting a number to an alternative carrier if we choose to leave the platform? The answer should specify regulatory minimums per market and the vendor's cooperation process. Vague answers indicate the vendor has not considered this scenario.
3. Does the platform support BYOT SIP trunking, and if so, what are the configuration requirements? A platform that cannot support BYOT is a platform that cannot be decoupled from its telephony layer.
4. What telephony availability SLA is contractually committed, separately from the AI platform availability SLA? If the vendor cannot separate these, they are treating telephony as an undifferentiated part of the platform — which is the signal that they are not managing it as infrastructure.
5. What data format is provided on contract termination for number estate export? The answer should specify E.164 formatted export with SIP routing configuration. "We'll provide what you need" is not an answer.
An evasive response to any of these questions — especially question one — should be treated as a significant procurement risk. If the vendor's numbers are registered in the vendor's name with the carrier, porting cooperation is at the vendor's discretion, not your right.
Telephony infrastructure is not a feature of your AI platform — it is independent utility infrastructure that should be evaluated, contracted, and exited separately.
- Confirm number ownership in writing before signing any bundled offering
- Specify BYOT or require explicit porting-cooperation clauses in the MSA
- Require separate telephony SLAs with financial remedies for breach
- Include transition-period telephony and number export requirements in exit provisions
- Test CLI reputation and STIR/SHAKEN attestation before first outbound campaign
Want to see this infrastructure in production? Try Dilr Voice with $20 free credits, book an AI placement diagnostic, or review our deployment methodology for enterprise voice programmes.
Build a telephony layer you own — not one you're locked into.
30-min scoping call. No deck. We'll map your current telephony exposure and tell you whether BYOT, a porting clause, or a full infrastructure redesign is the right fix for your programme.
Written by the Dilr.ai engineering team — practitioners who ship enterprise voice AI in production. Follow us on LinkedIn for shipping notes, or subscribe via the RSS feed.