Technical writing
Workplace safety violations: using OSHA inspection and citation data to find dangerous employers
Since 1972, the Occupational Safety and Health Administration has recorded every workplace inspection it has conducted, every safety and health standard it cited, every penalty it proposed, and every penalty that survived employer negotiation. That record — more than 2.5 million inspections across half a century — is public, downloadable in bulk, and almost entirely ignored by the analysts who would most benefit from it. The dataset does not merely show which employers got caught violating safety law. It shows which employers get caught repeatedly, which violations carry paper penalties that vanish in settlement, and which industries are inspected at rates far below their injury burden.
This article covers the structure of the OSHA enforcement database, the distinction between inspection types and what it means for citation rates, the gap between proposed and final penalties, the repeat violator problem, how to access the data, and three research applications that surface patterns not visible in any other federal dataset.
What the enforcement database covers
The Occupational Safety and Health Act of 1970 created two parallel enforcement systems. Federal OSHA directly covers most private-sector workplaces across 28 states and territories. An additional 22 states and two territories operate their own OSHA-approved programs — state-plan states — that must be at least as effective as federal OSHA. State-plan states that receive federal approval are required to submit their inspection and citation data to OSHA's central database, the Integrated Management Information System (IMIS), which feeds the public enforcement data portal.
The result is a dataset that covers most private-sector workplace inspections in the United States from 1972 to the present, across all industries, including construction, manufacturing, agriculture, healthcare, oil and gas, and general industry. Federal, state, and local government workplaces are excluded from federal OSHA jurisdiction; some state-plan states cover their own public-sector employers and submit that data as well.
The public enforcement database is maintained at the OSHA Enforcement Data portal at enforcedata.dol.gov. Bulk CSV downloads cover the full inspection and citation history. The OSHA developer API at developer.dol.gov provides programmatic access to the same underlying data with field-level documentation. As of 2025, the database contains records of approximately 2.5 million inspections, roughly 3 million citations, and citation-level penalty data going back to the program's first full year of operation.
Inspection types: programmed and unprogrammed
Every inspection in the database carries an inspection type code that distinguishes programmed from unprogrammed activity. This distinction is analytically critical because the two inspection types produce radically different citation rates and reflect entirely different enforcement logics.
Programmed inspections are planned in advance without reference to a specific complaint or incident. OSHA generates programmed inspection lists using injury and illness data from employer-submitted OSHA 300 logs and Bureau of Labor Statistics survey data, targeting establishments in high-hazard industries or with injury rates above industry averages. The Site Specific Targeting (SST) program, which drives most programmed general industry inspections, selects establishments from the OSHA 300A annual summary data. Local Emphasis Programs (LEPs) and National Emphasis Programs (NEPs) generate programmed inspection lists targeting specific hazards: silica exposure in foundries, process safety management at refineries, heat illness in outdoor industries, and so on.
Unprogrammed inspections respond to a specific triggering event. The four primary triggers recorded in the database are: employee complaints (the most common), referrals from another agency or from the employer itself, follow-up inspections to verify abatement of previously cited violations, and accident or fatality investigations. Fatality investigations are classified separately within the unprogrammed category and are identifiable by the inspection type code and the fatality/catastrophe flag field.
Citation rates differ systematically between the two types. Programmed inspections — conducted at establishments selected for high injury rates or high-hazard industry classification — find violations at lower rates than accident investigations. This is not evidence that programmed inspections are wasteful. It reflects selection: accident investigations go to workplaces where something has already gone wrong, which are by definition workplaces where violations existed. Programmed inspections find violations in workplaces that had not yet experienced a recordable incident, which means the same violation found in a programmed inspection prevented an injury that never appears in any dataset.
Data structure: three linked tables
The OSHA enforcement data is organized across three linked tables: establishment data, inspection data, and violation data. Understanding the join structure is necessary before any analysis.
The establishment table contains the employer record: establishment name (free text, not normalized), street address, city, state, zip code, Standard Industrial Classification (SIC) code for older records, and NAICS code for inspections from 2003 onward. The establishment also carries a size field recording the number of employees at the time of inspection. This field is self-reported by the employer and is frequently inaccurate for large multi-site employers, where the employee count may reflect only the inspected location rather than the enterprise.
The inspection table links to the establishment by establishment ID and records: inspection open date, inspection close date, inspection type code (programmed vs. unprogrammed subtype), inspection scope (comprehensive, partial, records-only), number of violations found by classification, total initial penalty proposed, total final penalty assessed after any contest or settlement, and whether the inspection resulted in a referral to another agency. The inspection table also carries the fatality/catastrophe flag and the number of injuries or illnesses that triggered an unprogrammed inspection.
The violation table is the most granular and most analytically valuable. Each row is a single citation item within a single inspection. Fields include: the specific regulatory standard cited (a CFR citation identifying the exact requirement violated), the violation classification, the initial penalty proposed for that item, the final penalty after any informal conference or contest proceedings, the abatement date (the deadline by which the employer must correct the hazard), and whether the violation was contested. The regulatory standard field is the key to industry-specific pattern analysis: you can compute which standards are most frequently violated, in which industries, and at what penalty levels.
Violation classifications follow a four-tier taxonomy defined by the OSH Act. Willful violations are those where the employer knew the condition violated a standard, or showed plain indifference to employee safety. Serious violations are conditions where there is a substantial probability of death or serious physical harm and the employer knew or should have known of the hazard. Repeat violations are citations for substantially similar conditions at the same employer within five years of a final order on a prior citation. Other-than-serious violations are conditions with a direct relationship to safety or health but where injury is unlikely to cause death or serious physical harm. Maximum penalties under the Bipartisan Budget Act of 2015 (indexed to inflation) are $15,625 per serious or other violation and $156,259 per willful or repeat violation as of 2025. These caps apply per citation item, not per inspection.
The penalty gap
The single most revealing field comparison in the OSHA database is the gap betweeninitial_penalty and current_penalty at the violation level. The initial penalty is what OSHA proposed at the time of citation. The current penalty is what the employer actually paid after any informal conference, settlement negotiation, or formal contest proceeding before the Occupational Safety and Health Review Commission (OSHRC).
Across the full enforcement database, the average reduction from initial to final penalty is 40–70%, depending on violation classification and industry. Willful citations — the most serious category, carrying maximum penalties that can reach six figures per item — are reduced by the largest absolute amounts. An employer cited for a $136,000 willful violation routinely settles the citation at $50,000–$70,000 through the informal conference process, which occurs before any formal contest is filed. This is not unique to OSHA; it reflects a standard feature of administrative enforcement: the proposing agency sets initial penalties at levels that leave room for settlement.
The penalty gap matters analytically because it means initial penalty totals dramatically overstate the financial consequences of OSHA enforcement. A press release announcing a $2 million proposed penalty for a construction fatality will typically produce a final settlement in the $600,000–$900,000 range. Researchers analyzing the deterrent effect of OSHA penalties must use the current penalty field, not the initial penalty field, because it is current penalties that employers actually pay and that rational-actor models of deterrence respond to.
A secondary analytical use of the penalty gap is identifying industries and employer types where the gap is unusually large. Employers represented by specialized OSHA defense counsel achieve larger reductions than unrepresented employers. Large national employers achieve larger reductions than small local employers. This creates a regressive enforcement dynamic: the employers most able to pay full penalties are the most effective at negotiating them down.
The repeat violator problem
The “repeat” violation classification in the database identifies employers who were cited for substantially similar conditions within five years of a prior final order. But the formal repeat classification understates repeat violations for a structural reason: OSHA's repeat violation determination requires that the prior citation became a final order at the same establishment or at another establishment of the same employer in the same OSHA area office jurisdiction. Multi-state employers with violations across many area office jurisdictions accumulate repeat conditions that are never classified as repeat violations in the database because prior citations occurred in a different area office.
OSHA addressed this partially through the Severe Violator Enforcement Program (SVEP), launched in 2010. SVEP designates employers who receive willful or repeat citations for high-gravity serious violations, or who are cited for fatalities, as enhanced enforcement targets. SVEP-designated employers are subject to mandatory follow-up inspections at all of their establishments across all area office jurisdictions, not just the establishment where the original violation occurred. The SVEP designation list is public and separately downloadable from the OSHA enforcement data portal.
The egregious violator penalty policy provides an additional analytical signal. Under the egregious policy, OSHA cites each exposed employee as a separate violation rather than grouping all exposed employees under a single citation item. An employer with 40 workers exposed to an unguarded machine receives 40 separate willful citations rather than one, multiplying the penalty by 40. Egregious citations are identifiable in the violation table by a flag field and by the unusually high citation counts relative to the number of hazards identified in the inspection narrative. They represent OSHA's highest-confidence finding of deliberate disregard for employee safety.
How to get the data
The primary access point for OSHA enforcement data is the enforcedata.dol.gov portal, which provides bulk CSV downloads of the inspection, violation, and accident tables. Downloads are updated weekly. The inspection CSV contains the establishment-level fields and inspection-level summary data; the violation CSV contains citation-level detail joined to inspection ID. Both files are large: the full inspection history CSV is several gigabytes uncompressed, and the violation file is larger.
The OSHA developer API at developer.dol.gov provides RESTful access to the same underlying data with field-level documentation. The API is rate-limited but supports filtered queries by NAICS code, date range, state, and violation classification — useful for retrieving targeted subsets without downloading the full bulk files. API keys are available free through a registration process at developer.dol.gov.
The Injury Tracking Application (ITA) at osha.gov/injuryreporting provides establishment-level injury and illness rate data submitted by employers under the OSHA 300A annual summary requirement. The ITA data is a separate dataset from the enforcement database but is the primary tool for identifying establishments with injury rates high enough to qualify for programmed inspection targeting. Joining ITA injury rate data to the enforcement database by establishment name and address produces the analytical foundation for Site Specific Targeting analysis: you can identify establishments with high injury rates that have not been inspected recently and that therefore represent uninspected risk.
Three research use cases
Construction fatality patterns and the Fatal Four
Construction is consistently the most dangerous major industry in the United States by absolute fatality count. OSHA's analysis of construction fatalities identifies four leading causes — the “Fatal Four” — that account for roughly 60% of construction worker deaths: falls, struck-by-object, caught-in/between, and electrocution. Each of the Fatal Four maps to specific OSHA construction standards under 29 CFR Part 1926.
The OSHA enforcement database enables fatality pattern analysis at a level of specificity the aggregated BLS Census of Fatal Occupational Injuries does not support. Using the fatality/catastrophe flag in the inspection table, filtering to construction (NAICS 23xx), and joining to the violation table produces a dataset of every fatal construction inspection since the early 1970s, with the specific standards cited at each fatality, the initial and final penalties, whether the employer contested, and how long abatement took. This dataset reveals which standards are cited most frequently at fatal inspections, which are cited at highest penalty (a proxy for OSHA's confidence in willfulness), and which are most frequently reduced to zero through informal conference — a pattern suggesting employers know certain fatal conditions are difficult to sustain on contest.
Agriculture, poultry, and meatpacking inspection rates vs. injury rates
Meatpacking and poultry processing plants have historically reported among the highest injury and illness rates of any manufacturing sector while receiving inspection rates far below what those injury rates would suggest under OSHA's Site Specific Targeting methodology. The reasons are partly jurisdictional — some meatpacking facilities in state-plan states receive less frequent federal scrutiny than they would under direct federal OSHA coverage — and partly political: the meat and poultry industry has successfully opposed OSHA ergonomics standards that would reduce the repetitive motion injuries that drive the illness rate.
The analytical approach joins ITA establishment-level injury rates to OSHA inspection records by establishment identifier, filters to NAICS 1112 (animal production), 3116 (animal slaughtering and processing), and related codes, and computes an inspection rate per 100 establishment-years. The resulting inspection intensity ratio — inspections per unit of reported injury burden — reveals which parts of the agricultural supply chain receive enforcement attention proportionate to their recorded hazard level and which do not. COVID-19 transmission data from 2020–2021, cross-referenced with OSHA inspections of food processing facilities during the same period, is a specific application of this methodology that has been used by academic researchers to assess the relationship between inspection activity and worker infection rates.
Federal contractor compliance and SAM.gov cross-reference
Federal contractors are required to certify compliance with OSHA standards as a condition of contract award. The System for Award Management (SAM.gov) maintains the Exclusions database of debarred and suspended contractors. Cross-referencing OSHA enforcement records with SAM.gov by employer name and address produces a list of federal contractors who have received willful or repeat OSHA citations while simultaneously holding federal contracts. This cross-reference is useful for federal procurement policy research and for identifying contractors who have certified OSHA compliance despite active enforcement actions in the period immediately prior to or during contract performance.
The entity resolution challenge is significant: contractor names in SAM.gov are the legal entity names of the contracting entity, which may differ from the establishment name in the OSHA database, which may in turn differ from the doing-business-as name at the inspected worksite. A practical approach uses the Employer Identification Number (EIN), which appears in some OSHA enforcement records and in SAM.gov registrations, as the primary join key, falling back to fuzzy name matching for records without EINs. The Federal Procurement Data System (FPDS) provides additional contractor records with DUNS/UEI numbers that can be joined to SAM.gov for a more complete contractor universe.
State-plan vs. federal OSHA
The 22 states and two territories with OSHA-approved state plans are required to submit inspection and citation data to the federal IMIS database, but the data integration is imperfect. State plan states submit data on a lag, use slightly different field formats for some codes, and have historically underreported data for certain inspection types. California's Division of Occupational Safety and Health (Cal/OSHA) is the largest state plan and has a well-funded, active enforcement program, but its data has historically been among the most inconsistently formatted in the federal database.
The practical consequence for researchers is that raw state-by-state inspection rate comparisons from the federal database are unreliable. A state that appears to have unusually low inspection rates may simply have a data submission lag rather than a genuinely inactive enforcement program. Researchers conducting state-level comparisons should separately obtain enforcement data directly from state plan agencies to validate against the federal database, particularly for California, Washington, Michigan, and Oregon, which have large and independently maintained enforcement programs.
State plan states also have discretion on penalty policy within the constraint of being “at least as effective as” federal OSHA. California sets significantly higher maximum penalties than federal OSHA for certain violation classifications. Washington state has historically used different informal conference reduction schedules. These policy differences appear in the penalty fields of the state-submitted data and must be accounted for in any cross-state penalty analysis.
Cross-referencing with related datasets
The OSHA enforcement database produces its strongest analytical results when joined to three adjacent federal datasets that cover overlapping employer populations under different legal frameworks.
The DOL OSHA 300 logs are employer-maintained records of workplace injuries and illnesses, submitted annually as the OSHA 300A summary. Establishments with 250 or more employees, or 20 or more employees in high-hazard industries, submit their 300A summaries to OSHA, which publishes them through the ITA. Matching ITA injury rate data to OSHA enforcement records by establishment enables the inspection intensity analysis described above: identifying establishments with high injury rates that have not been recently inspected and therefore represent unaddressed risk concentrations.
Workers' compensation data is the insurance-sector analog to OSHA enforcement data: it records injuries and illnesses from the employer's insurance perspective rather than the government's enforcement perspective. Workers' comp data is held by state agencies, not a federal database, and access varies widely across states. Where it is available, joining workers' comp claim rates to OSHA enforcement records by employer reveals the gap between reported injury burden and regulatory response: employers with high claim rates but few OSHA inspections are the establishments that the enforcement system has not yet reached.
The Mine Safety and Health Administration (MSHA)database covers mining and quarrying workplaces under a separate statutory scheme, 30 U.S.C. § 801 et seq. MSHA's enforcement data is publicly available at arlweb.msha.gov/OpenGovernmentData and is structurally similar to the OSHA database: establishment records, inspection records, and violation records with penalty fields. Mining employers are exempt from OSHA jurisdiction but subject to MSHA jurisdiction, so MSHA data must be treated as a separate dataset rather than a supplement to the OSHA database.
DOL Wage and Hour Division enforcement data covers minimum wage, overtime, and child labor violations under the Fair Labor Standards Act (FLSA). Employers in agriculture, food processing, and construction — sectors with high OSHA violation rates — also appear frequently in Wage and Hour enforcement data. The Wage and Hour enforcement database is publicly available at dol.gov/agencies/whd/data and includes employer name, violation type, the number of employees affected, and back wages assessed. Joining OSHA enforcement records to Wage and Hour enforcement data by employer name and address surfaces employers who simultaneously violate safety standards and wage payment requirements — a combination most commonly found in subcontracting arrangements where competitive pressure on labor costs produces both wage theft and safety shortcuts.
Limitations
The OSHA enforcement database is the most comprehensive public record of workplace safety enforcement in the United States, but it carries structural limitations that constrain what it can demonstrate.
Complaint-driven bias is the most significant. The majority of OSHA inspections are unprogrammed, triggered by employee complaints. Workers who are afraid of retaliation, who lack knowledge of OSHA's complaint process, or who work for employers who have successfully suppressed organizing and collective action are far less likely to file complaints. Industries with high rates of immigrant workers, workers on temporary visas, or workers in informal employment relationships — agriculture, domestic work, some construction subcontracting — are systematically underrepresented in the complaint-driven portion of the database relative to their actual hazard exposure.
Small employer underinspection follows from OSHA's resource constraints. With approximately 1,800 federal inspectors covering eight million workplaces, OSHA could inspect each establishment roughly once every 165 years if inspections were distributed uniformly. They are not distributed uniformly — high-hazard industries receive more attention than low-hazard ones — but the coverage ratio means that small employers who are not in targeted industries, who do not generate complaints, and who have not had a reportable fatality or hospitalization are essentially never inspected. The absence of an OSHA record is not evidence of compliance; it is evidence of not having been inspected.
Lagging data for contested citationsmeans the current penalty field for recently issued citations may not reflect final settlement amounts. When an employer contests a citation before the OSHRC, the contest process can take years. During that period, the current penalty field retains the initial proposed penalty. A citation database query for recent years will therefore overstate final penalty amounts for citations that are still in contest, and will understate penalty reductions for citations that settled after the analysis date. Researchers should apply a vintage filter — restricting analysis to citations at least two to three years old — before computing average penalty reduction statistics.
Employer name normalization is a persistent data quality problem. Establishment names in the OSHA database are entered by area office staff from whatever the employer calls itself at the inspected site. A single large employer may appear under dozens of name variants — with and without legal suffixes, with varying abbreviations, under subsidiary names, under doing-business-as names that differ from the registered legal name. Any employer-level analysis that counts violations or penalties across establishments requires entity resolution, either through EIN matching for records that carry it or through fuzzy string matching for records that do not.
Despite these limitations, the OSHA enforcement database is one of the most detailed longitudinal records of corporate safety behavior in any public regulatory dataset. Fifty years of inspection records, linked to specific standards, specific employers, and specific penalty outcomes, is a unique analytical resource. The complaint-driven bias and small employer underinspection mean the database shows where the enforcement system looked, not a complete picture of where hazards exist. Analysts who understand that distinction can use the data to answer questions the enforcement system itself has never systematically asked: which employers appear repeatedly across years and jurisdictions, which violations are cited but never corrected, and which industries receive regulatory attention proportionate to the injuries they generate.
Related writing
Wage theft at scale: using DOL Wage and Hour enforcement data to find FLSA violators — How to acquire and analyze the DOL Wage and Hour Division's enforcement database to identify employers with repeat minimum wage, overtime, and child labor violations.
Who won, who lost: five years of union elections in NLRB data — How to pull, clean, and analyze NLRB union election records — RC and RD cases, the 2021–2024 organizing surge, the 100k export cap workaround, and cross-dataset correlations.
By the numbers: using EEOC charge statistics to find discrimination patterns by industry and employer — How to acquire and analyze EEOC charge statistics and FOIA-released charge-level data to surface industry-level discrimination patterns and employer repeat appearances.