How BitSafe Runs on Notion — Part 5: The Agent Governance Model

The exact pattern that lets a small team run around 60 AI agents without losing control: a registry, narrow scopes, and propose, approve, apply change control.

One agent is a demo. Sixty is an operations problem. Standing up your first AI agent is a good afternoon's work. The trouble starts later, when there are a dozen of them, then thirty, and nobody can say out loud what each one does, what it is allowed to touch, or who answers for it when one misbehaves. Most teams meet that fog in one of two ways. They freeze at five agents because going further feels reckless, or they keep adding until the fleet is a black box nobody fully understands. We went a third way. Today around 60 governed agents run inside our workspace, and the reason we can keep adding more is not that we are careful by temperament. It is that every agent lives inside the same governance pattern.

BitSafe is an infrastructure company. We brought Bitcoin onto Canton with CBTC, and we are open-sourcing the Decentralization Manager, one of the first decentralization layers for Canton. The same instinct that shapes our products, build the layer that others build on, is what made us run our own team this way. Here is what that pattern looks like.

Sprawl is the real problem, not capability

The thing that breaks first at scale is not the quality of any single agent. It is your ability to answer simple questions about the whole set. Which agents are running right now? What can each one change? Who approves a change to one of them? When something looks wrong, where do you look first? Without ready answers, every new agent raises the odds that two of them quietly step on each other, or that one keeps acting on a rule nobody remembers writing. Governance is how you keep those answers cheap to retrieve, no matter how many agents you run.

The registry: if it is not in the registry, it does not run

Every agent is a row in one database. That database is the source of truth for the whole fleet. Each row records the owner, a one-line purpose, a category, the agent type, and a status that moves through In development, Active, and Retired. It also records the agent's scope, its triggers, the tools it is allowed to use, a link to its instructions page, and a relation to its own change log.

The rule on top of the registry is blunt: if an agent is not in the registry, it does not run. That single constraint kills the most common failure mode, the shadow agent someone spun up for a one-off task and never switched off. A glance down the table shows the entire surface area of automation in the company, which is exactly the view you lose when agents accumulate informally.

Four shapes cover almost everything

Most of the fleet falls into four patterns, and naming them keeps the system legible.

Watchers react to something appearing: a new page, a new row, a new message. They classify it, enrich it, or flag it for a person.
Schedulers run on a clock. They prep the day, score the pipeline before the week starts, or check what is due.
Autofillers classify and fill database properties at volume. They are the quiet workhorses, doing one narrow labeling job across thousands of rows.
Assistants wait to be asked. You mention one when you want it, and it answers in context.

Once you know which of the four shapes an agent is, you already know roughly how it behaves, how it gets triggered, and what could go wrong with it. A new agent inherits a category instead of inventing one.

One job per agent

Narrow scope is the design rule, not an accident. The temptation with a capable model is to hand one agent a broad mandate and let it work things out. We do the opposite. A storyline-mapping agent edits exactly one property and nothing else. A contact-classification agent does a single labeling job, and it has now run about 7,700 times doing only that. The agent that preps meetings does not also reach into the CRM.

Narrow scope buys two things. Behavior becomes predictable, because an agent that can only change one field cannot cause a surprise three tables away. And failures become easy to localize, because when something is off you already know which small job to inspect. A fleet of single-purpose agents is far easier to reason about than a handful of clever generalists.

Propose, approve, apply: change control for instructions

This is the core of the model, so it is worth stating precisely. No agent can edit its own instructions. An agent that can rewrite its own rules is an agent whose behavior can drift with nobody noticing.

Instead, every instruction change runs through one controlled loop:

Propose. The change is written into a dedicated changes database, capturing the proposed wording, where the suggestion came from, and a status.
Approve. The proposal is routed to the agent's human owner, the person named in the registry. The owner is the approver. This is not a central admin signing off on everything; approval sits with whoever owns that agent.
Apply. Only after approval is the change applied, and it is applied by a dedicated tuning agent whose only job is to apply approved changes.

The result is that an agent's behavior only ever changes through a logged, owner-approved step. There is always a written record of what changed, who asked for it, and who approved it.

flowchart LR
    P["1. Propose
change written to the changes database
wording · source · status"] --> A["2. Approve
routed to the agent's human owner
the owner is the approver"]
    A --> AP["3. Apply
a dedicated tuning agent
applies the approved change"]
    AP --> L["Audit trail
queryable database + weekly digest"]
    O["Oversight agent — read-only, watches everything"] -.-> L
    style A fill:#F4652F,color:#fff

Agents that govern the agents

The change-control loop is itself run by agents, but split carefully so that no single agent holds too much power.

The tuning agent applies approved changes and does nothing else. It never decides what should change and never approves anything. A separate oversight agent is read-only by design: it cannot edit any agent, and its job is to watch. It posts a weekly digest of every change across the fleet and raises an alert whenever a change is actually applied. The audit trail is not a log file someone has to remember to open; it is a database, queryable like everything else in the workspace.

Splitting the work this way matters. The agent that can act cannot approve, and the agent that watches cannot act. That separation is what keeps the loop honest as the fleet grows.

Approval gates for anything that leaves the building

Inside the workspace, agents have room to work. The moment anything points outward, a person stands in the path. Agents propose, people approve. The agent that handles outbound communication requires three separate human sign-offs before a single message goes out. The newsletter agent drafts but never publishes. The scheduling agent proposes dates, and a person confirms them.

None of this is friction for its own sake. The gates sit exactly where a mistake would be public or hard to undo, and nowhere else, so the routine internal work stays fast.

Why this is what makes scale possible

It is tempting to read all of this as overhead, the bureaucracy you take on once you have too many agents. It is the reverse. Because each agent is narrow and owned, with every change logged, the marginal cost of one more agent is low. Adding the 61st agent does not add fog: it enters the same registry, fits one of the same four shapes, gets one job, and inherits the same propose, approve, apply loop. The structure is what lets the number keep climbing without the team losing track of it.

That is the quiet argument underneath the whole system. Governance is not the tax you pay for scale. It is the thing that makes scale safe to reach for.

Where to start

If you are building on Canton and want to compare notes on running an agent fleet without losing control, find Kadeem Clarke on the Canton ecosystem Slack. Happy to walk through any part of the model.

Keep reading

Start with the hub: The Infrastructure Mindset, Turned Inward — How BitSafe Runs on AI

How BitSafe Runs on Notion — the brain:

Part 1: Notion as the Company OS · Part 2: The Architecture · Part 3: Agents, Automations, and the AI Layer · Part 4: Replacing Salesforce with Notion · Part 5: The Agent Governance Model

The NanoClaw series — the reach:

Part 1: Building a Company-Wide AI Assistant · Part 2: The Architecture · Part 3: The Autonomous Engine · Part 4: The Substrate · Part 5: Working With NanoClaw · Companion: Cost Discipline

Standalone deep-dives:

Why Not Just Use the Claude App? · The Invisible Seam · Measuring an AI OS, Honestly