Technical writing

Federal Regulatory Data Hub change alerts: near-real-time OFAC sanctions, SAM debarments, and enforcement action webhooks

· 10 min read· AI Analytics
Regulatory dataComplianceInfrastructureCloudflare

A compliance team screening counterparties against the OFAC SDN list needs more than an API they can query. They need to know within minutes when a counterparty gets added. A defense contractor doing daily SAM debarment checks is missing the point: the question is not whether the entity is debarred today, but whether it became debarred since the last check. Building the Federal Regulatory Data Hub as a static query layer was the first step; building the change-detection and alerting layer on top of it is what makes it operationally useful.

This post covers how we detect changes across 208 federal datasets with different publication cadences, how we classify those changes into actionable event types, and how webhook delivery reaches subscribers with at-least-once semantics and sub-15-minute detection windows for the highest-priority lists.

Change detection architecture

Different datasets have different publication patterns, which means change detection is not uniform across all 197 sources. We use three strategies depending on the dataset's update mechanism:

StrategyDatasetsDetection window
conditional-GET pollingOFAC SDN/Non-SDN, OFAC Consolidated, FinCEN10–15 min
bulk-file hash deltaSAM.gov exclusions, UFLPA, BIS Entity List, State Debarment30–60 min
RSS/notification feedEDGAR 8-K/10-K filings, FDA warning letters, DOJ press releases, CFPB enforcement orders10–20 min

The OFAC conditional-GET approach polls the SDN publication endpoint every 10 minutes with an If-None-Match header using the stored ETag. OFAC typically updates the list between 10am–1pm ET on business days; the 10-minute window catches updates within two polling cycles. When OFAC returns HTTP 200 (content changed), the ingest pipeline runs immediately rather than waiting for the next scheduled cron.

Record-level diffing

Detecting that the OFAC XML file changed is not enough — we need to know which specific entries were added, modified, or removed to generate meaningful events. We maintain a hash index on normalized records and diff against it after each ingest.

-- Record hash index in Cloudflare D1
CREATE TABLE record_hashes (
  dataset_id    TEXT NOT NULL,
  record_id     TEXT NOT NULL,   -- source's primary key (OFAC uid, SAM CAGE, etc.)
  content_hash  TEXT NOT NULL,   -- SHA-256 of normalized record JSON
  last_seen_at  TIMESTAMPTZ NOT NULL,
  PRIMARY KEY (dataset_id, record_id)
);

-- Change detection query (after new ingest batch)
-- Returns (added, modified, removed) record_ids
WITH new_hashes AS (
  SELECT record_id, content_hash FROM staging_records WHERE dataset_id = ?
),
old_hashes AS (
  SELECT record_id, content_hash FROM record_hashes WHERE dataset_id = ?
)
SELECT
  n.record_id,
  CASE
    WHEN o.record_id IS NULL THEN 'added'
    WHEN n.content_hash != o.content_hash THEN 'modified'
  END AS change_type
FROM new_hashes n LEFT JOIN old_hashes o USING (record_id)
WHERE o.record_id IS NULL OR n.content_hash != o.content_hash
UNION ALL
SELECT o.record_id, 'removed'
FROM old_hashes o LEFT JOIN new_hashes n USING (record_id)
WHERE n.record_id IS NULL;

Record normalization before hashing strips fields that change on every ingest without indicating a meaningful change (e.g., OFAC's publish_date on the file header, microsecond-precision timestamps that drift due to ingest time). The normalized hash only covers fields that affect screening outcomes: entity name, aliases, sanctions programs, addresses, identification numbers, and remarks.

Event classification

Each detected change maps to one of six event types, ordered by operational urgency:

EVENT_TYPES = {
    # Highest priority — immediate compliance action required
    'entity_sanctioned':    'Entity added to OFAC SDN, Consolidated, or FinCEN list',
    'entity_debarred':      'Entity added to SAM exclusions, State debarment, or UFLPA',

    # High priority — enforcement action filed
    'enforcement_filed':    'New SEC enforcement, FDA warning letter, DOJ press release, CFPB order',

    # Medium priority — existing record updated
    'entity_sanctions_modified': 'Existing OFAC/FinCEN record changed (new program, address, alias)',
    'entity_debarment_modified': 'Existing SAM/debarment record changed (scope, dates, type)',

    # Lower priority — administrative updates
    'enforcement_resolved': 'SEC enforcement closed, CFPB consent order satisfied',
    'record_updated':       'Non-enforcement record update (address, SIC code, etc.)',
}

For OFAC records, distinguishing entity_sanctioned from entity_sanctions_modified uses the record hash delta: if the record_id did not exist in the previous hash index, it is a new sanction; if it existed and the hash changed, it is a modification. The same logic applies to SAM exclusions.

Subscription filters

Subscribers configure change alerts using a filter specification stored against their API key. Filters can target specific entities, datasets, event type classes, or severity levels:

# Subscription filter specification (JSON)
{
  "filter": {
    # Watch specific entities by your own identifiers
    "entity_ids": ["DUNS:123456789", "LEI:529900...", "CIK:0001234567"],

    # Watch entire lists
    "datasets": ["ofac_sdn", "ofac_consolidated", "sam_exclusions"],

    # Watch by event type severity
    "min_severity": "high",   # high | medium | low

    # Watch by agency
    "agencies": ["ofac", "sam", "sec_edgar", "fda"],

    # Watch by entity name pattern (fuzzy-matched using existing pipeline)
    "entity_name_patterns": ["Acme Corp", "Global Trade Partners"]
  },
  "delivery": {
    "webhook_url": "https://compliance.example.com/regulatory-alerts",
    "hmac_secret": "<subscriber-provided-secret>"
  }
}

The entity_ids filter is the most precise: it maps subscriber-provided identifiers through the entity bridge (CIK, LEI, DUNS, UEI, NPI) to the internal entity_master ID, then watches for any regulatory record linked to that entity to change. A subscriber watching CIK:0001234567 (an SEC filing company) automatically receives alerts for OFAC sanctions, SAM debarment, FDA enforcement, and DOJ actions against the same entity, because the entity bridge links all of them.

The entity_name_patterns filter uses the same three-pass fuzzy matching pipeline from the entity matching system: exact normalized match first, then Jaro-Winkler ≥ 0.88, then TF-IDF cosine ≥ 0.72. Matches below the exact threshold include a match_confidence field in the alert payload.

Webhook payload schema

# Webhook POST to subscriber endpoint
{
  "event_type": "entity_sanctioned",
  "idempotency_key": "sha256:f3c8a...",   # hash of (dataset + record_id + change_type + ingest_ts)
  "detected_at": "2026-04-21T14:08:33Z",
  "dataset": "ofac_sdn",
  "agency": "ofac",
  "severity": "high",

  "entity": {
    "entity_id": "OFAC_UID:23456",
    "name": "EXAMPLE TRADING CO LLC",
    "aliases": ["EXAMPLE TRADING", "ETC LLC"],
    "sanctions_programs": ["SDGT", "IRAN"],
    "countries": ["IR", "AE"],
    "entity_type": "entity",
    "match_confidence": 1.0   # 1.0 for entity_id filter, <1.0 for name pattern
  },

  "change": {
    "change_type": "added",   # added | modified | removed
    "changed_fields": null,   # array of changed field names for 'modified', null for 'added'
    "previous_programs": null # previous sanctions programs for 'modified' events
  },

  "api_url": "https://api.ai-analytics.org/entity/OFAC_UID:23456"
}

For modified events, changed_fields lists which specific fields changed: ["aliases", "sanctions_programs"] for an OFAC record that gained a new program. This lets subscribers implement their own logic about whether a modification is material — a new alias is low urgency, a new IRAN program designation is high urgency.

Delivery pipeline

Change events enter a Cloudflare Queue immediately after the diff computation. Each event is delivered once per matched subscriber using at-least-once semantics; idempotency is enforced on the subscriber side using the idempotency_key.

// Cloudflare Worker: webhook delivery queue consumer
export default {
  async queue(batch: MessageBatch<ChangeEvent>, env: Env): Promise<void> {
    for (const msg of batch.messages) {
      const event = msg.body;
      const subscribers = await matchSubscribers(event, env.DB);

      for (const sub of subscribers) {
        const delivered = await attemptDelivery(sub, event, env);
        if (delivered) {
          msg.ack();
        } else {
          msg.retry({ delaySeconds: backoffSeconds(msg.attempts) });
        }
      }
    }
  }
};

function backoffSeconds(attempt: number): number {
  // 30s, 5min, 20min, then DLQ after 3 retries
  const delays = [30, 300, 1200];
  return delays[Math.min(attempt - 1, delays.length - 1)];
}

HMAC signing uses the subscriber's own secret (not a shared secret), so subscribers can verify that the payload came from the hub without trusting our infrastructure in isolation:

async function signedDelivery(
  url: string, payload: object, secret: string
): Promise<Response> {
  const body = JSON.stringify(payload);
  const key = await crypto.subtle.importKey(
    'raw', new TextEncoder().encode(secret),
    { name: 'HMAC', hash: 'SHA-256' }, false, ['sign']
  );
  const sig = await crypto.subtle.sign('HMAC', key, new TextEncoder().encode(body));
  const hex = Array.from(new Uint8Array(sig))
    .map(b => b.toString(16).padStart(2, '0')).join('');

  return fetch(url, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-Hub-Signature-256': `sha256=${hex}`,
      'X-Delivery-Id': crypto.randomUUID(),
    },
    body,
  });
}

Detection latency by source

DatasetPolling intervalMedian detectionSubscriber delivery (p50)
OFAC SDN / Consolidated10 min5–10 min+45s
FinCEN 314(a) / FBAR10 min5–10 min+45s
SAM.gov exclusions30 min15–30 min+45s
BIS Entity List / UFLPA60 min30–60 min+45s
SEC EDGAR (8-K, 10-K)RSS every 5 min5–10 min+45s
FDA warning lettersRSS every 15 min10–15 min+45s
DOJ press releasesRSS every 15 min10–15 min+45s
CFPB enforcement actionsRSS every 15 min10–15 min+45s

The 45-second subscriber delivery latency covers queue consumer pickup (typically 5–15s), subscriber filter evaluation (8ms), and webhook POST (280ms median). The detection latency is dominated by source polling cadence, not delivery infrastructure.

Entity bridge disambiguation

A common scenario: a compliance team monitors a portfolio company under its trade name. When OFAC adds a subsidiary using a slightly different name, the name-pattern filter picks it up with match_confidence 0.81 (below the exact threshold, above the Jaro-Winkler threshold). The alert payload includes:

{
  "event_type": "entity_sanctioned",
  "entity": {
    "name": "ACME TRADING HOLDINGS LTD",      # new OFAC entry
    "match_confidence": 0.81,
    "matched_pattern": "Acme Corp",            # subscriber's watch pattern
    "possible_parent": {                        # entity bridge lookup
      "entity_id": "CIK:0001234567",
      "name": "ACME CORP",
      "relationship": "possible_subsidiary",   # based on SEC Exhibit 21
      "relationship_confidence": 0.72
    }
  }
}

The possible_parent field is populated by checking whether the newly sanctioned entity appears in the SEC Exhibit 21 subsidiary map for any entity the subscriber is watching. This is not a guaranteed match — it is a signal that warrants human review. The alert always includes the raw OFAC data so the compliance team can make the final determination.

Rate limiting and deduplication

A bulk OFAC update can simultaneously add 50 new entities — for example during a large Russia-linked sanctions package. Without rate limiting, all 50 events would land on the subscriber's webhook endpoint within 60 seconds, which most compliance workflows are not designed to handle.

We group bulk updates from a single ingest run into batched delivery: all changes from the same ingest cycle are bundled into a single webhook POST with an events array up to 25 entries. Changes beyond 25 are queued for subsequent delivery windows. This reduces webhook volume by 10–20× during large sanctions packages while preserving event content fidelity.

Idempotency uses the idempotency_key field, which is a SHA-256 hash of dataset + record_id + change_type + ingest_timestamp. If the same ingest event is delivered twice due to queue retry, the subscriber can detect the duplicate by caching received idempotency keys. We recommend a 24-hour cache window; the same record will not generate a second event within 24 hours unless the record changes again.


Related technical articles: