Financial Services · Digital Banking

Compliance violations caught in pre-production.

A leading US digital bank serving millions of customers, running voice and chat AI agents across millions of monthly conversations in a regulated environment.

The challenge

Chime had two problems running in parallel. A vendor decision they had to get right between two leading AI agent platforms, with no way to compare them beyond demos. And a harder problem: their agents were occasionally claiming capabilities they didn't have, in regulated banking conversations where overstating a capability isn't a customer experience issue. It's a compliance one.

Manual QA couldn't keep up. At millions of monthly conversations, sampling a fraction of calls meant most issues were caught only after customers complained. And every vendor update risked introducing regressions in the workflows that mattered most: fraud escalation, identity verification, sensitive banking actions.

No objective vendor comparison.

The team had to choose between two AI agent platforms based on demos and intuition.

Compliance violations slipping through.

Agents were claiming capabilities they didn't have, caught only after customer feedback.

No regression coverage on updates.

Every model or prompt change risked breaking fraud and identity verification flows.

Manual QA didn't scale.

Sampling 1–2% of conversations left the rest uncovered.

The impact

2,768 Calls analyzed in vendor bake-off

<2 weeks From kick-off to vendor decision

100% Of conversation designs validated pre-launch

13× Expansion in 12 months

Before and after

Metric	Before Coval	With Coval
Vendor evaluation	Demos and vendor-supplied data	Quantified across 2,768 calls in under two weeks
Compliance coverage	Reactive, caught after customer complaints	Proactive, caught in simulation before launch
QA model	1–2% of calls manually sampled	Every conversation design simulated before launch
Regression testing	Limited spot-checks between vendor updates	Every vendor update validated against the full suite
AI governance	Distributed, no single source of truth	Coval is the system of record

How Coval helped

Chime didn't start with the goal of building an AI governance program. They started with a vendor decision and a small set of workflows to test. What changed everything was the moment Coval surfaced a compliance issue that nobody on the team had seen coming.

An objective vendor bake-off in under two weeks.

Coval ran both candidate AI agent platforms against the same 2,768 simulated calls, with the same evaluators measuring the same compliance and accuracy criteria. The output wasn't a vendor opinion. It was a comparison the team could defend, with failure modes mapped per vendor and per workflow type.

Compliance violations caught before customers saw them.

Months into pre-production testing, Coval surfaced cases where the agent was overstating what it could do in regulated banking workflows. These were exactly the kind of issues that manual QA could spot once or twice but not at scale. Catching them in simulation reframed Coval inside Chime from a testing tool to a compliance one.

Every change runs through simulation before it ships.

Coval is now where every conversation design, vendor update, and compliance check goes before launch. The team moved from reactive triage to proactive validation, with a regression suite that grows every time a new failure mode is found.

The result

For Chime, the shift was bigger than any single metric. The team stopped finding out about agent failures from customer feedback and started finding them in simulation, before launch. The compliance team stopped reviewing a sample and started signing off on a process. Vendor updates stopped being a quarterly risk and became a routine release.

"Coval moved AI governance at Chime from sampling to certainty. Every conversation design, every vendor update, every compliance check runs through simulation before it touches a member."

What started as a vendor evaluation became the system of record for AI governance at Chime, across voice, across chat, across millions of monthly conversations.