Technical writing
The Wall of Shame: what the HHS-OCR HIPAA breach database reveals about healthcare data security
Since 2009, the Department of Health and Human Services Office for Civil Rights has maintained a public list of every healthcare data breach affecting 500 or more patients. Journalists nicknamed it the “Wall of Shame.” As of mid-2026, it holds more than 5,000 entries and documents breaches affecting upward of 500 million individuals — more than the entire US population, a consequence of the same individual appearing in multiple breached records. The database is free, downloadable as a CSV, and almost entirely ignored by the compliance and security communities that should be studying it most carefully.
The statutory foundation: 45 CFR Part 164 Subpart D
The HIPAA Breach Notification Rule lives at 45 CFR Part 164 Subpart D, added by the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009. The rule imposes three notification obligations when a covered entity experiences a breach of unsecured protected health information (PHI):
- Individual notice — affected individuals must be notified without unreasonable delay and within 60 calendar days of discovery. For breaches involving 500 or more residents of a state or jurisdiction, substitute notice via prominent media outlets in that area is required in addition to direct mail.
- HHS notice — breaches affecting 500 or more individuals must be reported to HHS simultaneously with individual notification. Smaller breaches go into a log that covered entities submit to HHS annually within 60 days of the end of the calendar year.
- Business associate notification — a business associate that discovers a breach must notify the covered entity within 60 days of discovery.
“Breach” has a specific statutory meaning: the acquisition, access, use, or disclosure of PHI in a manner not permitted by the Privacy Rule that compromises the security or privacy of the PHI. There is a presumption of breach; a covered entity can rebut it only by demonstrating through a four-factor risk assessment (nature of the PHI, the unauthorized person who accessed it, whether it was actually acquired, and the extent to which risk has been mitigated) that the probability of compromise is low. Few organizations successfully invoke this exception.
The 500-person threshold for immediate HHS notification is what populates the Wall of Shame. HHS is required by statute to post all such reports on a public website. OCR has done so since February 2010, when the first entries appeared.
Where the data lives and how to get it
The interactive portal is at https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf. It allows filtering by state, covered entity type, breach type, and date range, and supports exporting up to 500 rows at a time from the current view. For bulk work, OCR makes a full CSV download available from the HIPAA breach notification page at https://www.hhs.gov/hipaa/for-professionals/breach-notification/breach-reporting/index.html. The CSV is updated as new reports are posted — typically within days of OCR receiving them — and includes both active investigations and archived cases.
The CSV structure is straightforward. Each row is one breach report with these key fields:
Name of Covered Entity— the legal name of the organization that filed the report. This is always the covered entity even if a business associate caused the breach.State— two-letter state abbreviation for the covered entity's principal place of business.Covered Entity Type— one of Healthcare Provider, Health Plan, Healthcare Clearing House, or Business Associate.Individuals Affected— the count reported by the covered entity, often an estimate at time of filing and updated later.Date of Breach— the date the breach occurred or was first discovered, whichever is earlier; often reported as the first day of a date range.Type of Breach— a controlled vocabulary: Hacking/IT Incident, Improper Disposal, Loss, Theft, Unauthorized Access or Disclosure, or Unknown.Location of Breached Information— another controlled set: Desktop Computer, Electronic Medical Record, Email, Laptop, Network Server, Other, Other Portable Electronic Device, Paper/Film.Business Associate Involved— a Yes/No field indicating whether a business associate was implicated.Web Description— free-text narrative, added after OCR closes or settles an investigation; contains resolution details, penalties, and corrective action plan requirements.
The dataset is not CC0 but is freely accessible as government-produced information under 17 U.S.C. § 105. There are no restrictions on downstream use.
Scale: 500 million exposures and one catastrophic outlier
Aggregating the Individuals Affected column across all 5,000-plus entries produces a running total that exceeds 500 million. That figure is not the number of Americans whose health records have been exposed — the same patient appearing in records at a hospital, their insurer, and a pharmacy benefits manager can be counted three times across three separate breaches — but it illustrates the accumulated exposure.
One entry distorts every aggregate calculation: the February 2024 ransomware attack on Change Healthcare, a subsidiary of UnitedHealth Group that processes roughly 15 billion healthcare transactions per year and touches approximately one in three American patient records. The ALPHV/BlackCat ransomware group exfiltrated data before deploying encryption; UnitedHealth confirmed in October 2024 that approximately 190 million individuals were affected, making it the largest healthcare data breach in US history by a factor of more than two.
The previous record-holder was the 2015 Anthem breach: 78.8 million records stolen via spearphishing and credential theft, resulting in a $16 million OCR settlement — the largest HIPAA settlement ever at the time. The Change Healthcare breach is more than twice as large by affected individuals, and the financial consequences for UnitedHealth are still unfolding: the company disclosed over $3 billion in direct costs through mid-2025, and OCR opened a formal investigation in February 2024.
Breach type trends: the shift from laptops to ransomware
The 2009–2014 period in the Wall of Shame is dominated by a single breach type: Theft. Laptops left in cars, USB drives lost at airports, boxes of paper records improperly disposed of in dumpsters. The Office for Civil Rights settled dozens of cases in this era with organizations that had failed to encrypt portable devices — a straightforward technical control that NIST had recommended for over a decade. Massachusetts Eye and Ear Infirmary paid $1.5 million in 2012 after an unencrypted laptop was stolen. Concentra Health Services paid $1.725 million in 2014 for the same failure pattern.
The inflection point is visible in the data around 2015–2016. Theft begins to decline as covered entities finally encrypt their laptops. Simultaneously, Hacking/IT Incident begins a steep ascent. By 2022, Hacking/IT Incident accounts for more than 80% of all breached individuals in any given year, even if it represents a smaller share of breach counts (because individual hacking incidents affect far more people than a single stolen laptop).
The Unauthorized Access or Disclosure category — which covers insider threats, snooping employees, and accidental disclosures — has remained roughly constant in absolute terms throughout the dataset's history. Its share of total affected individuals has shrunk as hacking incidents grew, but it remains a persistent source of smaller breaches. Hospital employees looking up celebrity patient records, staff accessing ex-partner health information, and misdirected faxes all fall into this category.
The Location of Breached Information field shows a parallel shift. Laptop peaks around 2012, then declines. Network Server rises through 2016–2020 as attackers move from endpoint theft to direct network compromise. Email spikes in 2018–2020 as phishing campaigns targeting healthcare credentials became a dominant attack vector. By 2023, Network Server and Email together account for the vast majority of Hacking/IT Incident entries.
Business associate exposure: vendor risk as the critical vector
The Business Associate Involved field is one of the most analytically useful columns in the dataset, and it understates the true picture. When a business associate causes a breach, the covered entity whose patients were affected files the report — but a single business associate compromise can generate dozens of separate Wall of Shame entries, one per covered entity client.
The 2023 ransomware attack on managed file transfer software MOVEit, exploiting a zero-day SQL injection vulnerability (CVE-2023-34362), illustrates this multiplier effect. The Clop ransomware group exploited MOVEit installations used by dozens of healthcare organizations and their vendors. The breach appeared in the Wall of Shame under the names of hospitals, health plans, and third-party administrators across the country — each one a separate entry, each one traceable to a single unpatched vulnerability in a piece of vendor software none of them directly controlled.
Change Healthcare is a more extreme version of the same dynamic. Because Change Healthcare processed claims for hospitals, physician groups, pharmacies, and insurers nationwide, a single breach of one vendor generated exposure for hundreds of millions of patients. The covered entities that filed breach reports with OCR were not the organizations that were attacked — they were the downstream clients of the organization that was attacked, and they had limited visibility into Change Healthcare's security posture before February 2024.
Business associate breaches in the dataset disproportionately involve large numbers of affected individuals because vendors — by definition — aggregate patient data across multiple covered entities. A hospital breach might affect the patients of that hospital. A billing company breach might affect the patients of every hospital that hospital uses for billing.
Resolution narrative mining: what the Web Description field reveals
The Web Description field is populated by OCR after it closes an investigation, either through a resolution agreement and settlement or through a finding of no violation. For high-profile cases, these narratives are dense with technically useful information.
The largest OCR settlements illuminate the compliance failures that most commonly underlie serious breaches:
- Anthem ($16 million, 2018) — the resolution agreement cited failure to conduct an enterprise-wide risk analysis, insufficient procedures to review information system activity, and inadequate technical controls to prevent unauthorized access. The breach itself involved sophisticated credential theft following spearphishing; the settlement was about the missing systematic security program that allowed those credentials to be so valuable once stolen.
- Premera Blue Cross ($6.85 million, 2019) — OCR found failures in risk analysis, risk management, information system activity review, and technical safeguards. The Premera breach, attributed to a nation-state actor, began with a phishing email in May 2014 but was not discovered until January 2015 — an eight-month dwell time that allowed extensive data exfiltration. The corrective action plan required enterprise-wide risk analysis completion within 90 days and a security monitoring program within 180.
- MAPFRE Life Insurance ($2.2 million, 2017) — a thumb drive containing PHI for 2,209 individuals was stolen from an employee's car. The settlement reflects the pre-encryption era pattern: a small breach in absolute terms but a large settlement because OCR found that MAPFRE had conducted a risk analysis, identified portable device encryption as a risk, but failed to implement the identified control for years after the analysis was completed. Identifying a risk and then ignoring it generates higher penalties than never identifying it at all.
The resolution narratives also contain corrective action plan requirements that are informative beyond the specific case. Common requirements include: completion of an enterprise-wide risk analysis; revision of policies and procedures for access control, audit controls, and encryption; workforce training with documentation; implementation of a security incident response plan with defined escalation procedures; and periodic reporting to OCR for one to two years post-settlement.
Python analysis: loading and working with the CSV
The bulk CSV download is clean enough to load directly with pandas. The main preprocessing steps are date parsing and handling the occasional multi-value entry in the Location field.
import pandas as pd
import matplotlib.pyplot as plt
# Load the HHS-OCR breach CSV
df = pd.read_csv(
"breach_report.csv",
parse_dates=["Date of Breach", "Date Reported"]
)
# Normalize column names
df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]
# Extract year from breach date
df["breach_year"] = df["date_of_breach"].dt.year
# --- Breach rate by covered entity type ---
type_summary = (
df.groupby("covered_entity_type")
.agg(
breach_count=("name_of_covered_entity", "count"),
total_affected=("individuals_affected", "sum"),
median_affected=("individuals_affected", "median"),
)
.sort_values("total_affected", ascending=False)
)
print(type_summary)
# --- Hacking/IT incident trend 2010-2023 ---
hacking = df[
(df["type_of_breach"] == "Hacking/IT Incident")
& (df["breach_year"].between(2010, 2023))
]
hacking_by_year = (
hacking.groupby("breach_year")["individuals_affected"]
.sum()
.reset_index()
)
fig, ax = plt.subplots(figsize=(12, 5))
ax.bar(hacking_by_year["breach_year"], hacking_by_year["individuals_affected"] / 1e6)
ax.set_xlabel("Year")
ax.set_ylabel("Individuals affected (millions)")
ax.set_title("HIPAA Hacking/IT Incident breach volume 2010-2023")
plt.tight_layout()
plt.savefig("hacking_trend.png", dpi=150)
# --- Per-capita breach analysis by state ---
# Requires Census population data (e.g., from Census API or ACS 5-year)
pop = pd.read_csv("state_population.csv") # state, population
state_breaches = (
df.groupby("state")["individuals_affected"]
.sum()
.reset_index()
.merge(pop, on="state")
)
state_breaches["affected_per_100k"] = (
state_breaches["individuals_affected"] / state_breaches["population"] * 100_000
)
top_states = state_breaches.nlargest(10, "affected_per_100k")
print(top_states[["state", "affected_per_100k"]])A few patterns emerge from this analysis. Healthcare Providers dominate breach count but Health Plans often lead on total individuals affected, because insurance records aggregate across many providers. The median breach at a Healthcare Provider affects a few thousand individuals; the median at a Health Plan is substantially larger. The per-capita state analysis tends to surface states with large single-employer health systems or states that were home to business associates with national client bases during years those vendors were breached.
The trend line for Hacking/IT Incidents from 2010 to 2023 is close to exponential growth punctuated by step changes at specific incidents: the 2015 Anthem breach, the 2020–2021 ransomware acceleration driven by the Ryuk, Conti, and REvil groups targeting hospitals during the COVID-19 pandemic, and the 2023 MOVEit exploitation campaign.
Cross-reference opportunities
The Wall of Shame becomes substantially more useful when joined to other datasets:
CISA KEV. The Cybersecurity and Infrastructure Security Agency maintains the Known Exploited Vulnerabilities catalog at https://www.cisa.gov/known-exploited-vulnerabilities-catalog. Cross-referencing the date ranges of Hacking/IT Incident breaches against KEV entries allows analysts to identify which publicly documented vulnerabilities appear most often in the months preceding healthcare breach clusters. The MOVEit SQL injection (CVE-2023-34362) was added to the KEV on May 31, 2023 — the same day CISA issued an alert. Healthcare breach entries referencing vendor MOVEit exposure begin appearing in the Wall of Shame within weeks. The overlap between KEV entries and healthcare breach timing reveals how quickly threat actors move from public disclosure to active exploitation of targets in this sector.
HHS-OIG exclusions. The HHS Office of Inspector General maintains the List of Excluded Individuals/Entities (LEIE) covering providers barred from federal healthcare programs for fraud, abuse, or other misconduct. Entity name matching between Wall of Shame entries and the LEIE is imperfect — the same organization may appear under different legal names — but name joins with Jaro-Winkler similarity surface cases where breach-involved organizations were subsequently excluded or where organizations with prior OIG exclusion history appear in breach data. This is a minority of cases but occasionally informative about systemic compliance failures at specific entities.
FDA medical device adverse events. The FDA MAUDE database covers adverse events involving medical devices, including cybersecurity incidents affecting networked medical equipment. Cross-referencing MAUDE device manufacturer names against Wall of Shame covered entity names identifies cases where device connectivity created the breach vector — insulin pump manufacturers whose devices were used at health systems that subsequently reported breaches involving network server access, for instance. This cross-reference requires careful entity resolution since MAUDE contains device manufacturer names while the Wall of Shame contains healthcare provider names; the link is indirect through device deployment records, which are not public.
SEC 8-K filings. Publicly traded healthcare companies are required to disclose material cybersecurity incidents to the SEC under rules effective December 2023. The Change Healthcare breach generated 8-K filings from UnitedHealth Group that can be read alongside the OCR breach report to understand the gap between public investor disclosure and regulatory disclosure. The Wall of Shame entry for Change Healthcare was filed in late 2024; UnitedHealth had disclosed the breach to investors in February 2024 under the new SEC rules. The two datasets cover the same event from different disclosure angles.
What the database does not show
The 500-individual threshold creates a significant selection bias. The Wall of Shame captures the largest breaches but is structurally blind to the universe of smaller incidents. A solo practice physician whose scheduling system is compromised, affecting 200 patients, files a report with HHS through the annual log process but does not appear in the public database. OCR's own enforcement data suggests it investigates thousands of complaints per year, the vast majority involving small providers and small incident counts.
The dataset also reflects what organizations discover and report, not what actually occurred. Dwell time in healthcare ransomware cases is typically measured in weeks to months; the date of breach in the CSV is usually the date of discovery or the earliest date the organization can confirm access occurred, not necessarily when the attacker first entered the network. The eight-month dwell time in the Premera case is documented in the resolution narrative; for most entries, the true exposure window is unknown.
Finally, the Individuals Affected count at time of filing is frequently revised upward. Change Healthcare's count moved from an initial estimate to 190 million over a ten-month period. Downstream analysis that uses the CSV without checking for updates will undercount the largest incidents by substantial margins.
Using the data responsibly
Patient data appears in the Wall of Shame in aggregate, not individually — the dataset contains entity names, breach parameters, and counts, not PHI. However, the names of covered entities are public, and some entries involve small practices where the covered entity name combined with the date range and breach type could identify individuals in small communities. This is not a reason to avoid the dataset, but analysts writing about specific small-provider entries should exercise the same caution that applies to any identifiable organizational disclosure.
OCR enforcement is notoriously resource-constrained. The agency has fewer than 300 full-time equivalent staff responsible for enforcing both HIPAA and civil rights law in healthcare programs that collectively represent trillions of dollars in annual activity. Settlement amounts reflect this: the $16 million Anthem settlement sounds large against the 78.8 million records compromised, but it represents less than two days of Anthem's annual revenue. The deterrence calculus is unfavorable. The Wall of Shame is, in one reading, a catalog of the price of non-compliance — and that price has historically been low enough that many covered entities have treated it as a cost of doing business rather than an existential risk.
Related writing
380 million transactions: indexing the DEA's ARCOS opioid distribution data — how to work with a massive controlled-substance transaction dataset released through opioid litigation discovery, and what it reveals about pill distribution patterns by county, manufacturer, and pharmacy.
One in four audits flagged: indexing PCAOB deficiency data across the Big 4 — PCAOB inspection reports surface the rate at which audit firms sign off on financial statements without sufficient supporting evidence — a structural disclosure failure with parallels to the OCR enforcement gap in healthcare.