sigmet consulting
← All articles

The Governance Gap Nobody Is Talking About

Most agentic AI programs have a documentation problem disguised as a governance problem. The gap between what firms say their systems do and what those systems actually do is widening — and regulators are starting to notice.

Most agentic AI programs have a documentation problem disguised as a governance problem. The gap between what firms say their systems do and what those systems actually do is widening — and examiners are starting to close it.


The Problem with Policy-First Governance

The standard playbook for AI governance at financial firms starts with a policy document. Someone in compliance drafts a framework — model risk management principles, a human oversight requirement, an escalation procedure — and the policy gets approved, signed, and filed. The program is considered launched.

The problem is that policies describe intent. They say what a firm plans to do. Examiners are increasingly focused on something different: what the firm actually did, and whether they can prove it. For most agentic AI deployments, those two things are not the same. The agent was deployed without the controls the policy described. The oversight workflow exists on paper but is not enforced in production. The escalation procedure covers the cases that were anticipated, not the ones that actually occur.

This gap is not new in financial services — it is the same gap that appears in AML programs, in trading surveillance, in complaint handling. What makes it acute for agentic AI is the speed at which the gap opens. A language model integrated into a workflow can make thousands of consequential decisions before anyone notices the oversight framework does not actually apply to it.

What Examiners Are Actually Looking For

Regulators examining AI governance are not primarily interested in your policy document. They are interested in the evidence trail that either confirms or contradicts it. Can you show that the oversight procedure described in the policy was actually applied to the decisions the system made? Can you produce a log of the cases that triggered human review, and explain why the cases that did not trigger review were treated differently?

The questions that create the most difficulty are not the ones about catastrophic failures. They are about the ordinary operation of the system in the months before the examination. What did the agent decide on its own? What did it escalate? Who reviewed the escalations, and what documentation exists of those reviews? If the answer to any of these is "we would need to reconstruct that from the logs" or "it depends on which team member was working that day," the governance program has a documentation problem.

FINRA's 2024 guidance on AI supervision made this explicit: the key question is not whether a firm has an AI policy, but whether the policy is implemented in a way that creates a verifiable record. The SEC's exam priorities have moved in the same direction. Documentation is no longer the evidence that governance exists — it is the starting point for proving it.

Building for Demonstrability

A governance program built for demonstrability starts from the opposite end. Instead of writing a policy and building controls to match it, you begin with the question: what would an examiner ask for, and can we produce it? That question surfaces the gaps before regulators do.

In practice, this means designing logging into the agentic workflow from the beginning, not as an afterthought. Every decision the system makes autonomously, every case it escalates, every parameter it uses to make that determination — these need to exist in a structured, queryable form. Not a log file that can be parsed if someone writes a script, but a record that can be filtered, exported, and explained to a non-technical examiner within hours of a request.

It also means being honest about what the current oversight workflow actually covers. If your human-in-the-loop procedure applies to high-confidence outputs and the agent has been routing low-confidence outputs to a different queue, the governance program does not cover what you think it covers. Mapping the actual decision surface of the system — including the paths that bypass the oversight procedure — is the foundational step that most firms have not taken.

The firms that will fare best in AI examinations are not the ones with the most comprehensive policies. They are the ones that can sit down with an examiner, pull up a dashboard, and walk through exactly what the system did, why it did it, and who reviewed it. That capability does not emerge from a policy document. It has to be built.