Voice AI and PCI DSS: Handling Spoken Card Numbers

The moment a caller reads a 16-digit card number aloud, the recording becomes payment card data under PCI DSS. It does not matter whether your voice agent understood the number, stored it, or forwarded it. If the audio exists, the cardholder data exists — and your programme is in scope.

Most enterprise voice AI deployments discover this during a compliance audit, when a QA analyst realises that months of call recordings contain spoken card numbers in plain audio. Remediation at that point costs ten times what prevention costs. The PCI DSS scope problem with voice AI is not exotic — it is the most commonly missed architecture decision in payments, retail, utilities, and financial services deployments.

This guide is the architecture conversation you should have before go-live: what PCI DSS v4.0 actually requires for voice channels, which of the three descoping approaches fits your deployment model, and the implementation checklist that keeps your voice programme out of a PCI audit finding.

This guide is published by the team behind Dilr Voice — enterprise voice AI live across payments, utilities, and regulated financial services. For architecture-level compliance support, see our AI placement diagnostic.

v4.0

PCI DSS version enforceable from March 2024

80%

Of PCI breaches involve cardholder data in storage systems (Verizon DBIR)

Requirements in PCI DSS v4.0, several with new voice-channel obligations

88%

Of enterprises use AI in some form — many in voice channels where PCI scope applies (McKinsey 2025)

What PCI DSS actually covers in a voice channel

PCI DSS v4.0 defines cardholder data as the primary account number (PAN) and associated data — cardholder name, service code, and expiry date. It defines sensitive authentication data (SAD) as the full magnetic stripe, CAV2/CVC2/CVV2 values, and PINs. Both categories are strictly regulated; SAD cannot be stored after authorisation under any circumstances, including encryption.

For a voice AI deployment, the scope question turns on how cardholder data enters your system. There are three entry points that enterprise teams routinely underestimate:

1. Spoken by the caller into the recording. A caller reads their 16-digit card number aloud. If you record audio, the recording contains cardholder data. The recording system, storage infrastructure, access controls, and all downstream analytics pipelines enter PCI DSS scope immediately.

2. Transcribed from audio. If your voice AI transcribes calls — as most enterprise deployments do for analytics, QA, and CRM population — the transcript may contain cardholder data. The transcription service, the transcript storage layer, and any CRM or analytics tool receiving transcripts are in scope if PANs appear in them.

3. Passed as a function call parameter. If the voice agent is designed to take card data as a tool call — passing it to a payment API — the agent environment, the orchestration layer, and the tool-calling infrastructure are in scope.

The critical principle is: scope follows data, not intent. Even if your agent was not designed to handle payments, if callers happen to read card numbers into billing enquiry calls, account management calls, or dispute resolution calls, you have PCI exposure in your recording infrastructure. The compliance architecture must account for caller behaviour, not just script design.

PCI DSS v4.0 introduced two requirements directly relevant to voice deployments that many legal and compliance teams have not fully absorbed:

Requirement 3.3.1: SAD must not be stored after authorisation, even if encrypted. This closes the old loophole of retaining encrypted card audio in compliance archives. An encrypted WAV file containing spoken card data is non-compliant under v4.0.
Requirement 12.3.2: Targeted risk analysis must be documented for any control implemented as an alternative to prescriptive requirements. If you use pause-and-resume or DTMF masking as your descoping approach, you must formally document why that control adequately addresses the risk — not just implement it.

For context on how call recording obligations interact with broader data compliance, the multi-jurisdiction call recording consent map covers how GDPR, PECR, and sector-specific rules layer onto the PCI recording requirement.

The three descoping architectures

Enterprise voice deployments handling payment calls use one of three approaches to manage PCI DSS scope. Each has a different risk profile, implementation cost, and operational overhead. The right choice depends on your call mix, telephony infrastructure, and existing payment processing stack.

Architecture 1: Pause-and-resume recording

The most commonly deployed approach. When a caller is about to enter payment data, the recording is paused programmatically. The agent takes the payment, the recording resumes after card data entry, and no cardholder data enters the recording system.

How it works: The voice platform sends a signal to the telephony layer — either via a SIPREC extension header or an API call to the recording controller — that suspends audio capture. The signal must be triggered before the caller starts reading card data, not after. This requires the agent to explicitly hand off payment collection before the caller speaks any card data, which in turn requires reliable intent detection: the LLM or flow logic must identify payment intent and trigger the pause before the caller begins.

The control failure mode: Pause-and-resume works only if the agent correctly identifies every payment interaction and triggers the pause reliably. If a caller volunteers card data in an unexpected context — "can I just pay over the phone now?" during a complaint call — and the script did not anticipate that path, the recording catches it. The control is only as strong as your script discipline and your LLM's intent classification accuracy.

PCI DSS v4.0 implications: If pause-and-resume is your descoping architecture, your SAQ or ROC must document the trigger mechanism, evidence that the pause covers the full cardholder data entry window, controls preventing agent access to cardholder data during the payment window, and a monitoring process for pause failures. Under PCI DSS v4.0, pause-and-resume deployments are typically assessed under SAQ C (agents processing cardholder data without card storage) or SAQ C-VT (hosted payment virtual terminal environments), depending on how card data is routed after capture.

Architecture 2: DTMF masking (touch-tone card entry)

The caller enters their card number using the phone keypad rather than speaking it. The DTMF tones are captured by the telephony layer, converted to card data, and forwarded directly to the payment processor — never entering the voice AI recording or transcription path.

How it works: The agent prompts the caller to enter their card number via keypad. The telephony platform intercepts the DTMF tones before they reach the recording layer, converts them to a structured data payload, and transmits that payload securely to the payment gateway. The audio recording continues throughout the call — the tones are muted or replaced with a neutral audio signal — but the recording contains no intelligible card data at any point.

The architectural advantage: DTMF masking is the cleanest descoping approach because card data never enters the voice AI pipeline in any form. There is no dependency on agent logic, LLM reliability, or script discipline for the PCI control. The recording and transcription layers are cleanly out of scope.

The user experience trade-off: Callers entering card data via keypad have higher friction than spoken entry, particularly on mobile devices. In high-volume payment contexts — utilities, telecoms, financial services renewals — DTMF masking is the standard approach because the UX friction is manageable and the architectural cleanliness is worth it.

PCI DSS v4.0 implications: With DTMF masking properly implemented, your voice AI recording infrastructure can be scoped out of PCI DSS entirely. Your telephony provider handling the DTMF path must be a PCI-compliant payment service provider (PSP), and you must maintain contractual evidence of their current compliance posture. Under v4.0, DTMF masking deployments may qualify for SAQ A-EP or may allow you to rely fully on your PSP's compliance scope.

Architecture 3: Full agent descoping

The voice agent is architecturally prevented from ever receiving cardholder data. Payment collection is handed off entirely to a PCI-compliant payment service provider, and the agent receives only the transaction outcome (success, failure, or reference token) via a structured callback.

How it works: When a payment interaction is required, the agent triggers a handoff to a PCI-compliant IVRS operated by a qualified payment processor — or to a DTMF entry flow hosted by the PSP — and waits for a callback with the outcome. The agent never processes, stores, or has access to cardholder data in any form. From the agent's perspective, the payment interaction is a black box with a binary output.

When to use this architecture: Full descoping is appropriate when your voice AI is deployed across multiple use cases and payment is one of several functions; when your organisation wants maximum simplicity in PCI audit scope with no dependency on voice AI vendor compliance posture; or when you have an existing PSP relationship with a compliant IVRS capability you can leverage.

The integration overhead: Full descoping requires a clean integration between your voice AI platform and the PSP's payment collection system. Handoff latency, callback reliability, and failure handling must be designed carefully. A failed callback after a completed payment creates a customer experience failure and a reconciliation problem. The integration design must include explicit timeout handling, retry logic, and a fallback to human transfer if the payment handoff fails.

What PCI DSS v4.0 changed for voice AI

PCI DSS v3.2.1 had significant ambiguity about voice channels. Version 4.0, enforceable from March 2024, resolved several of those ambiguities in ways that directly affect voice AI deployments. If your compliance team is still operating under v3.2.1 assumptions, the following changes require attention.

Requirement 3.3.1 — No SAD storage, even encrypted. Pre-4.0, some organisations argued that encrypted SAD — including audio recordings of spoken card data — could be retained if access controls and encryption were strong enough. Requirement 3.3.1 closes this entirely: SAD must not be retained after authorisation is complete, regardless of encryption status. An encrypted WAV file containing spoken card data is a v4.0 finding.

Requirement 12.3.2 — Targeted risk analysis for compensating controls. If you implement pause-and-resume or DTMF masking as a control that substitutes for a prescriptive requirement, you must now formally document a targeted risk analysis. The analysis must identify the specific risk being addressed, the controls in place, the residual risk, and how the controls are monitored over time. Auditors are explicitly looking for this documentation — a control that exists in architecture but lacks a risk analysis document does not satisfy 12.3.2.

Requirement 8.3.6 — Service account authentication. Voice AI platforms that access payment systems via service accounts must meet updated minimum complexity and rotation requirements. If your voice agent authenticates to a payment API using a static service credential, that credential is in scope for 8.3.6 and must meet the updated minimum complexity requirements.

Requirement 12.9.2 — Vendor compliance confirmation. Your voice AI vendor and telephony provider must confirm their own PCI DSS compliance status annually. Under v4.0, you cannot rely on a historic AOC (Attestation of Compliance) from a prior year — you need current, dated confirmation. Build vendor compliance confirmation into your annual contract renewal cycle and your third-party risk management programme.

For context on how third-party vendor obligations interact with data transfer and processing, the voice AI cross-border data transfer guide covers vendor-side compliance obligations under GDPR and adequacy frameworks that sit alongside PCI requirements.

The emerging risk: AI-generated call summaries

An often-overlooked PCI DSS exposure point in voice AI deployments is the AI-generated call summary. Many enterprise voice platforms now produce automatic summaries after hang-up, written to CRM systems for agent context on the next interaction.

If a caller spoke their card number during a call, and the AI summary system processed the full transcript to generate a summary, the summary system has processed cardholder data — even if the summary itself contains no card number. Under PCI DSS v4.0, any system that processes cardholder data is in scope, even if the processing is transient.

If your AI summary pipeline ingests full call transcripts, and those transcripts can contain cardholder data under any circumstances, the summary pipeline is in PCI scope unless you implement one of the following:

Transcript filtering before the summary pipeline: PANs and SAD are redacted from the transcript before it is passed to the summary generation system. Redaction must be applied to the raw transcript, not to the summary output.
DTMF masking or pause-and-resume at the source: If card data never enters the audio or transcript in the first place, the summary pipeline has no exposure.

This is an architecture decision that must be made at design time. The pattern of discovering it during a PCI audit — when months of CRM entries have been populated by summaries generated from transcripts that contained card data — is expensive and difficult to remediate.

Sector-specific considerations

The right PCI architecture varies by sector because the nature of the payment interaction — its frequency, predictability, and call context — differs significantly across verticals.

Utilities and telecoms. Utilities and telecoms handle high volumes of payment calls — bill payments, reconnections, and payment plan arrangements. These are typically structured, predictable interactions where DTMF masking is the standard approach. The challenge is that vulnerable customers — particularly those in hardship or on payment plans — often provide card data in atypical call contexts: complaint calls, arrears conversations, or multi-topic service calls where the payment arises mid-call without scripted anticipation. Pause-and-resume controls in utilities must account for unscripted card data entry, with anomaly detection monitoring recordings for spoken card numbers as a compensating control.

Retail and e-commerce. Retail voice channels — order placement, dispute resolution with refund to card, returns — have high variability in when card data appears. Callers frequently offer card data mid-conversation without prompting. Full agent descoping or robust DTMF masking with broad trigger coverage are the preferred architectures. Retail deployments should also account for subscription or stored-card calls where the caller may attempt to provide a new card number to update their account.

Financial services. FCA-regulated firms face dual compliance: PCI DSS for card handling, and FCA Code of Conduct obligations for fair treatment in payment interactions. Financial services voice AI must evidence not just that card data was not stored, but that the payment interaction was transparent, consented to, and not unduly pressured. Audit trails for payment calls must include the transaction outcome, the agent logic that triggered the payment flow, and evidence of consumer-appropriate handling.

Healthcare and pharmaceutical. Healthcare payments — co-pays, prescription fees, treatment deposits — are often incidental to the primary call purpose. Healthcare voice AI deployments rarely design for payment handling as a primary use case, which is precisely the risk. A patient calling about appointment rescheduling may volunteer card details for outstanding balance. Healthcare deployments need a catch-all pause trigger for any mention of payment intent, regardless of the call's primary purpose. For broader context on regulated healthcare voice deployments, the voice AI architecture guide for regulated industries covers the layered compliance stack that healthcare deployments must satisfy.

Implementation checklist

Use this checklist at design time and as an ongoing audit tool for voice AI deployments where payment call handling is in scope.

Before go-live:

Pre-go-live PCI checklist

Identify all call types where payment card data could be captured — both by design (scripted payment flows) and by caller behaviour (unscripted card mentions in non-payment calls)
Select and document your descoping architecture (pause-and-resume, DTMF masking, or full agent descoping)
Complete a targeted risk analysis per PCI DSS v4.0 Requirement 12.3.2
Confirm PCI DSS compliance status of your voice AI vendor and telephony provider — obtain a current, dated AOC, not a historic one
Review your MSA and DPA with the voice AI vendor to confirm explicit exclusion of cardholder data from retention, analytics, and model training
Configure and test recording controls with simulated card data entry — including failure scenarios where the pause signal does not arrive in time
Document the full data flow from call initiation to payment outcome (required for SAQ or ROC submission)
Confirm your AI summary pipeline does not ingest unfiltered transcripts that may contain card data

Ongoing:

Ongoing PCI maintenance

Monitor recording systems for inadvertent cardholder data capture — implement anomaly detection or periodic transcript sampling for PAN patterns
Review and re-confirm vendor AOC annually — build this into your contract renewal calendar
Update your targeted risk analysis after any architecture change, platform upgrade, or new call type added to scope
Include voice AI explicitly in your annual PCI DSS scope review — it is frequently omitted from scope documentation
Test pause-and-resume and DTMF controls after every significant platform update or telephony infrastructure change
Brief your Qualified Security Assessor (QSA) on your voice AI platform architecture before each annual assessment

What to ask your voice AI vendor

When evaluating a voice AI vendor for a deployment that will handle payment calls, these questions separate vendors who have thought through PCI from those who are learning about it during your procurement:

1. What is your current PCI DSS compliance level? The vendor should produce a current AOC on request. If they cannot, or if the AOC is more than 12 months old, they are either non-compliant or compliant under a scope that may not cover your deployment requirements.

2. Does your platform support pause-and-resume recording via a documented API? The pause trigger must be reliable, latency-tolerant, and logged. Ask for the technical documentation for how the pause signal is implemented, what happens if the signal fails, and how failures surface in your audit trail.

3. Does your transcription pipeline filter cardholder data before storage? If the vendor transcribes calls, they must either filter PANs from transcripts before writing them to storage, or ensure transcripts containing card data are architecturally impossible (DTMF masking). Ask for documentation of their transcript data handling, not just a verbal assurance.

4. What is your data processing agreement position on cardholder data? Your DPA must explicitly exclude cardholder data from any retention beyond immediate processing, from training data use, and from analytics or benchmarking pipelines. This must be contractual, not a policy statement. The voice AI MSA contract clauses guide covers the 11 clauses enterprise legal teams should require — cardholder data exclusion is one of them.

5. Do you maintain a QSA relationship? Vendors who take PCI seriously engage a Qualified Security Assessor for their platform's annual assessment. Ask which QSA they use, when the last assessment was completed, and whether you can receive a summary of findings on request.

6. How do you handle calls where a caller volunteers card data outside a scripted payment flow? This question distinguishes vendors who have designed for real-world caller behaviour from those who have designed only for the happy path. The answer should include a combination of LLM intent monitoring, anomaly detection, and a compensating control for unscripted card data capture.

For the broader picture of how PCI intersects with GDPR consent architecture in the UK, the GDPR consent capture guide for AI voice calls covers how lawful basis decisions interact with recording and data handling obligations.

Want to see this in production? Try Dilr Voice (free, $20 credits), review the AI placement diagnostic for architecture-level compliance review, or read how we approach compliance-first voice AI deployment across regulated industries.

Service

AI Placement Diagnostic

Architecture review

Get your voice AI PCI architecture right before go-live.

30-min scoping call. No deck. We'll map your call mix to the right descoping architecture and tell you what your QSA will ask.

Book a call → Try Dilr Voice ↗

Written by the Dilr.ai engineering team — practitioners who ship enterprise voice AI in production across payments, utilities, and regulated financial services. Follow us on LinkedIn for shipping notes, or subscribe via the RSS feed.

What PCI DSS actually covers in a voice channel

The three descoping architectures

Architecture 1: Pause-and-resume recording

Architecture 2: DTMF masking (touch-tone card entry)

Architecture 3: Full agent descoping

What PCI DSS v4.0 changed for voice AI

The emerging risk: AI-generated call summaries

Sector-specific considerations

Implementation checklist

What to ask your voice AI vendor

Get your voice AI PCI architecture right before go-live.

Related articles

Voice AI Call Recording: A Multi-Jurisdiction Consent Map

ISO 42001 for Voice AI: The New Procurement Signal

Voice AI in Recruitment: UK Employment Law in 2026

One email, once a month. No hype. Just what we learned shipping.