Technical writing

When a Breach Happens: How One Cyber Incident Surfaces Across Federal Disclosure Regimes

· 9 min read· AI Analytics
CISASECCybersecurityOpen DataData Engineering

A company gets breached. Where does the US government record it? The answer is not one place but several, and which ones depend on what was exploited and whose data was taken. There is no national cyber-breach registry; instead a single incident throws off separate disclosures into separate systems, each triggered by a different fact and keyed in a different way. Understanding the three main ones is how you reconstruct an incident from public data.

Three regimes, three triggers

  • CISA Known Exploited Vulnerabilities — the technical trigger. When a vulnerability is being actively exploited in the wild, CISA adds it to the KEV catalog, keyed by CVE. It says a flaw is being used — not who got hit.
  • SEC 8-K, Item 1.05 — the corporate trigger. A public company that experiences a cyber incident it deems material must disclose it on an 8-K, generally within four business days of the materiality determination. Keyed by filer (CIK).
  • HHS OCR breach portal — the sectoral trigger. A HIPAA-covered entity that breaches the unsecured health data of 500+ individuals must report it to HHS, which posts it publicly. Keyed by covered-entity name.

The same ransomware event at a hospital chain could appear in all three: a KEV entry for the exploited VPN flaw, an 8-K if the chain is publicly traded, and an OCR breach report for the patient records. A breach at a private manufacturer might appear in noneof them. The coverage is a patchwork, not a census.

The join: victim name and a date window

As with the other federal pipelines, there is no incident identifier that spans the three. KEV keys on CVE, the SEC on CIK, OCR on entity name. Linking them for one incident means resolving the victim organization's name across SEC and OCR and aligning a date window — and, where the 8-K or OCR narrative names the CVE, threading back to KEV. Entity resolution does the work again.

# One incident, three unconnected federal feeds, no shared key.
# Each is triggered by a DIFFERENT fact and keyed differently.

import requests

# 1. CISA KEV — the exploited VULNERABILITY (keyed by CVE, not by victim).
kev = requests.get(
    "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json",
    timeout=30).json()  # -> list of {cveID, vendorProject, product, dateAdded, ...}

# 2. SEC 8-K Item 1.05 — the MATERIAL cyber incident, for public filers
#    (keyed by CIK/ticker). Pulled from EDGAR full-text + filing index.

# 3. HHS OCR breach portal — HIPAA-covered breaches >= 500 individuals
#    (keyed by covered-entity name). Public "wall of shame".
ocr = requests.get(
    "https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf", timeout=30)  # HTML table

# To link them for ONE incident you join on victim ORGANIZATION name + date window
# (and, where possible, the CVE that SEC/OCR narratives name). There is no
# federal incident ID that spans all three.

The mismatches that make it hard

  • Different clocks. KEV adds a CVE when exploitation is observed; the SEC clock starts at the materiality call; OCR's at breach discovery. The same event has three different dates.
  • Different thresholds. 8-K turns on “materiality”; OCR on a 500-individual count; KEV on active exploitation. An incident can clear one bar and not the others.
  • Different subjects. KEV names a product/vendor, OCR names the breached entity, the 8-K names the filer. The “who” differs by regime, so the join is rarely one-to-one.
  • Silent gaps. Private companies with no health data and no securities can be breached and appear in none of the three — absence here is not evidence of safety.

Assembled with those caveats, the three feeds answer questions none can alone: which actively-exploited CVEs show up in material corporate disclosures, how fast sectors report, and where the same victim surfaces under more than one regime. The map is partial by law — but it is public, and it is joinable.


Related writing: Information Rights Have Two Sides — a cross-dataset join that, unlike this one, has an exact shared key (country).

Related writing: The Recall Web — the same shape in consumer safety: one question fragmented across agencies with no shared key.

See also: CISA Known Exploited Vulnerabilities — the technical feed at the center of this join.