Every regulated domain, early in the building of it, hits the same fork. You have a piece of knowledge, and you have to decide what kind of thing it is. Is the list of sanctioned parties a rule, or is it data? Is the table of drug interactions a rule, or data? The adequacy decisions between countries, the withholding rates in tax treaties, the threshold percentages that trigger a filing? Each of these can be modeled either way, and the choice seems like an implementation detail. It is not. It is the decision that determines whether the system can survive a changing world, because the world changes these things constantly, and the system that tangled them into its logic has to be rebuilt every time one moves.
This paper is about that split. What belongs in a rule, what belongs in data, why they must be kept apart, and what goes wrong when they are not.
Two kinds of knowledge
The split tracks a real difference between two kinds of regulatory knowledge, and once you see the difference the fork mostly resolves itself.
The first kind is the obligation. The structure of what the regulation requires. A breach must be notified within a window of awareness. A holding crossing a threshold triggers a filing. A transfer to a non-adequate destination needs a safeguard. These are stable. The obligation that crossing a disclosure threshold creates a filing duty has held for decades, even as the specific threshold and the specific window have been amended. The obligation is the shape of the requirement, and shapes change slowly.
The second kind is the fact. The specific values the obligation operates on. The exact percentage that counts as the threshold. The exact number of days in the window. Which destinations are adequate this year. What rate this treaty sets. These change all the time, on schedules set by legislatures, regulators, and courts, none of whom consult you. They are not the shape of the requirement. They are the current readings plugged into it.
A rule encodes the obligation. Data holds the facts. The rule says "a filing is due within the window from the crossing." The data says "the threshold is five percent and the window is five business days." Keep those as separate things, and you have drawn the line in the right place.
Different clocks
The clearest reason to separate them is that they go stale at completely different rates, and binding two things with different staleness clocks into one artifact means the slow thing inherits the fast thing's churn.
Consider the rates of change. A sanctions list updates daily, sometimes several times a day, as names are added and removed. Tax rates and treaty terms move on annual fiscal cycles. Adequacy decisions can change overnight when a court rules. Threshold percentages and filing windows change when a regulator amends a rule, every few years. And the underlying obligations, the shapes, barely change at all across decades.
Now imagine these are tangled together. The sanctioned-party list lives inside the screening logic. The withholding rates are constants in the tax rules. The adequacy status is hard-coded into the transfer checks. Every one of those frequent data changes is now a change to logic, which means it is a code change: written, reviewed, tested, deployed, with all the ceremony that touching decision logic in a regulated system requires and should require. You have taken the most volatile inputs in the entire domain and routed them through the heaviest change process you have. A daily list update becomes a deployment. A court ruling becomes an emergency engineering project.
Separate them, and the same changes become what they should be. A new name on the sanctions list is a data load. A treaty rate change is a row update. A court striking down an adequacy decision is a new version of a table. The logic, which was tested and correct and has not actually changed, is never touched. The world moved; the data moved; the rules held still. The system absorbs change at the speed the change actually has, instead of forcing every change through the speed of a code review.
Different owners
There is a second reason, quieter but just as structural. Rules and data belong to different people.
A rule is an encoding of an obligation, and someone has to be accountable that the encoding is faithful to the regulation. That is expert work, and it carries a sign-off: a person who knows the domain has confirmed that this rule means what the law means. The trust in a rule comes from that human judgment, and it is slow and expensive and worth it.
Data is different. The current sanctioned-party list is maintained by a sanctions function, or licensed from a provider, and refreshed on a schedule. The treaty rates are maintained against published treaties. The adequacy matrix is kept current against regulatory decisions. This is maintenance work, often continuous, and it is owned by whoever is responsible for keeping the reference current, which is usually not the same person who signs off on the obligation logic. The rule author and the data steward are different roles, with different skills and different cadences.
Tangle rules and data together and you have forced these two roles into one artifact, so that maintaining a rate requires touching logic the rate steward is not qualified to own, and revising an obligation requires wading through data the rule author should not be editing. Separate them and each role works in its own layer, at its own pace, with its own accountability.
The seam
The mechanism that makes the separation work is simple to state: rules reference data, they never embed it. The threshold rule does not contain the number five. It contains a reference to the threshold for this regime, and the threshold lives in a table. When the threshold changes, the table changes, and the rule, which only ever pointed at the table, now reads the new value without having changed at all.
This is what makes a regulatory amendment a configuration update. When the SEC shortened the Schedule 13D window from ten calendar days to five business days, a system built this way changed two fields in a row: the day count and the day basis. The rule that says "a filing is due within the window from the crossing" did not move, because it never knew about ten days. It knew about the window, and the window is data. The amendment that would have been a code change in a tangled system was a row update in a separated one, and the difference is not cosmetic. It is the difference between a system that tracks the law easily and one that falls behind it.
The seam also has to be versioned, which is the point where this split meets provenance. Because data changes, and because decisions have to be judged against the law as it stood when they were made, the data is not just current, it is historical. Each version carries its effective date. A decision made last year references the version of the table that was in force last year, not today's. The rule reads the data as it stood on the date of the decision, so a past decision can still be explained against the values that actually governed it. Separation of rules and data is what makes that possible, because only data can carry versions cleanly; logic with values baked in cannot.
When the line is hard
Most of the time the fork is obvious. A list of a hundred thousand sanctioned names is data; no one would write it as rules. A single obligation is a rule. But there are genuine hard cases, and they are worth being honest about, because the principle has to handle them.
The hard cases are usually the middle layer, where knowledge is both structured and voluminous. Class-level drug interactions, for instance: is "NSAIDs interact with anticoagulants" a rule or a datum? It has the shape of a rule, a condition and a consequence, but there can be thousands of such statements. The principle that resolves it is this: bulk, enumerable facts that change on an external schedule are data, even when each one looks like a small rule, while the curated, high-judgment layer that an expert signs off on is rules, even when it is small. A comprehensive licensed interaction set is data, loaded and refreshed. A curated set of class-level interaction policies that a clinician reviewed and stands behind is rules. The test is not the shape of the statement. It is whether the knowledge is bulk and externally maintained, or curated and expert-owned. Volume and ownership decide it, not form.
The point
The first real architectural decision in a regulated engine is not which model to use or how to run the rules. It is where to draw the line between rules and data, and the line goes between the obligation and the facts it operates on. Obligations are rules: stable, expert-signed, version-controlled. Facts are data: volatile, separately owned, refreshed on the world's schedule and versioned by effective date. Keep them apart with rules that reference data rather than embedding it, and the system absorbs a changing world the way it should, as configuration rather than engineering. The reason this matters is not elegance. It is that regulation never stops moving, and a system that tangled its logic with the moving parts is a system that is always one amendment behind. The split is what keeps it current, and being current, in compliance, is most of being correct.