OSHA Severe Injury Reports: The Federal Record of Amputations and Hospitalizations Since 2015

Somewhere in the country, a machine catches a hand, a fall sends a worker to the hospital, a flying fragment takes an eye—and within 24 hours, by federal rule, the employer has to pick up the phone and tell OSHA. Since January 1, 2015, every one of those calls has become a row in a public dataset: roughly 103,000 work-related amputations, eye losses, and in-patient hospitalizations, each naming the employer, the place, the industry, the body part, and what the agency did next. It is the closest thing the United States has to a near-real-time federal ledger of the moment work turns dangerous.

This article covers what the Severe Injury Reports dataset is and the 29 CFR 1904.39 reporting rule that created it; what changed on January 1, 2015, when OSHA replaced a narrow multi-fatality trigger with a requirement to report individual severe injuries within hours; the fields each report carries—employer name and address, NAICS industry, incident date, affected body part, and the nature and source of the injury; the pivotal choice OSHA makes in response to each report, between opening its own inspection and directing the employer to conduct a Rapid Response Investigation; the two coverage caveats that dominate interpretation—the State Plan jurisdiction gap and OSHA's own finding of substantial underreporting; how the data joins to industry classifications and geography and what that supports analytically; a Python workflow that loads the downloadable file and aggregates by sector, body part, state, and response type; and the limitations every analyst must internalize before drawing conclusions.

What the dataset is

The Severe Injury Reports dataset is OSHA's record of the individual serious workplace injuries that employers are required to report to the agency. The category is precise and statutory: a work-related amputation, a work-related loss of an eye, or a work-related in-patient hospitalization. These are the “severe injuries” the rule names—injuries serious enough to cost a limb, an eye, or a hospital admission, but distinct from a fatality, which carries its own faster reporting deadline and its own data. Each qualifying event becomes one report, and the public dataset assembles those reports into a single national table beginning the day the reporting expansion took effect—January 1, 2015—and running to the present. OSHA publishes the complete file as a download on osha.gov, alongside a dashboard for browsing it.

In our database this record is stored as the table osha_severe_injuries, with the grain of one row per reported event: a single amputation at a single plant on a single day is one row, and an employer that reports three separate hospitalizations over a year contributes three rows. The dataset comprises roughly 103,000 reports across the decade, dominated by amputations and in-patient hospitalizations and concentrated in manufacturing, construction, and warehousing. The columns describe the injured worker's employer, where, in what industry, how, and what OSHA did about it:

employer                -- name of the reporting employer
address / city / state  -- location of the establishment
zip / latitude / longitude -- geocoded site (accuracy not guaranteed)
naics                   -- primary industry classification of the employer
event_date              -- date the incident occurred
hospitalized            -- count of in-patient hospitalizations
amputation              -- count of amputations
part_of_body_title      -- the affected body part (e.g. finger, hand, eye)
nature_title            -- the nature of the injury (amputation, fracture...)
event_source_title      -- the object/source that caused it (machine, fall...)
inspection              -- OSHA inspection id, if OSHA opened an inspection
rri                     -- Rapid Response Investigation id, if employer-led

Three groups of columns do the analytical work. The who and where—employer, address, geocode, and NAICS—is what lets an analyst attribute an injury to a named establishment, an industry, and a place, and the NAICS code in particular is the join key that connects a single event to the structure of an entire industry. The what happened—the body part, the nature of the injury, and the source—is a compact three-field anatomy of the event: a finger (part_of_body_title) amputated (nature_title) by a press (event_source_title). These three fields use OSHA's standardized occupational-injury-and-illness coding, so they are comparable across hundreds of thousands of records. And the OSHA response—the inspection and rri identifiers—records the single most consequential decision the agency makes about each report, the choice between investigating the incident itself and handing the investigation back to the employer, which the next sections take up in detail.

The 2015 reporting rule and what changed

The dataset exists because of a single, consequential amendment to OSHA's recordkeeping regulation. The governing provision is 29 CFR 1904.39—“Reporting fatalities, hospitalizations, amputations, and losses of an eye to OSHA.” Effective January 1, 2015, OSHA revised that rule to require every employer under its jurisdiction to report, within fixed and short deadlines, the severe injuries an individual worker suffers. Under the revised rule an employer must report a work-related fatality within 8 hours, and a work-related in-patient hospitalization, amputation, or loss of an eye within 24 hours of learning of it. The report goes directly to OSHA—by phone to the area office or the national hotline, or through an online form—and it is these 24-hour severe-injury reports, not the fatalities, that populate the Severe Injury Reports dataset.

To grasp why this was a watershed, it helps to know what the rule replaced. Before 2015, the reporting obligation was far narrower: an employer had to report only a fatality or the in-patient hospitalization of three or more employeesfrom a single incident. A lone worker losing a hand, an arm, an eye—or a single worker admitted to a hospital—triggered no report at all. The practical effect was that the entire universe of individual severe injuries was invisible to OSHA in real time; the agency learned of them, if at all, only later and indirectly, through the annual injury logs employers keep. The 2015 change inverted that. By lowering the trigger to a single severe injury and shrinking the deadline to a day, the rule created—for the first time—a near-real-time federal stream of individual severe-injury events, flowing in continuously rather than surfacing in retrospect. That is the dataset's defining quality and the reason it is analytically distinct from the older, establishment-level injury logs.

The rule rests on the broader statutory architecture of the Occupational Safety and Health Act of 1970, which created OSHA and charged it with assuring safe and healthful working conditions by setting and enforcing standards. Reporting obligations of this kind serve two purposes the Act contemplates: they give the agency the information it needs to target its limited enforcement resources toward the most dangerous workplaces, and they create a factual record of where and how serious harm is occurring. The 2015 expansion was, in effect, a bet that knowing about individual severe injuries quickly—rather than learning of patterns only in aggregate, long after the fact—would let OSHA intervene where intervention could still prevent the next one.

Anatomy of a report: body part, nature, and source

What makes the dataset more than a list of names and dates is its standardized description of each injury, captured in three coded fields that together reconstruct the mechanics of the event. They are not free text; they draw on OSHA's occupational injury and illness classification, the same coding scheme that underlies the agency's broader injury statistics, which is what makes hundreds of thousands of disparate incidents comparable.

The part of body field records the anatomy affected. In a dataset dominated by amputations this skews heavily toward the upper extremities— fingers above all, then hands, then arms—because the prototypical severe injury in American industry is a finger or hand caught in moving machinery. The nature field records the injury itself: amputation, fracture, crushing injury, burn, and so on. Amputations and in-patient hospitalizations are the two qualifying categories that bring an event into the dataset, so the nature field is how an analyst separates the amputation stream from the broader stream of injuries serious enough to require hospitalization (a fall fracture, a thermal burn, a crushing injury). The source field records what caused the harm—the object, substance, or exposure: a specific kind of machine, a fall to a lower level, vehicular equipment, a structure or surface. Read together, the three fields turn a bare report into a diagnosis: not merely that a worker was hurt, but that a worker's finger was amputated by a power press, or that a worker was hospitalized after a fall from a scaffold. That diagnostic structure is what makes the dataset a tool for hazard analysis rather than only a tally of harm.

OSHA's response: inspection versus Rapid Response Investigation

The most analytically interesting feature of the dataset is not the injury itself but what OSHA does about it. Each report confronts the agency with a choice, and the dataset records which way it went. Because the 2015 rule generates far more reports than OSHA has inspectors to investigate, the agency cannot send a compliance officer to every reported severe injury. It triages.

One path is the OSHA inspection: the agency opens its own investigation, dispatches a compliance safety and health officer to the establishment, examines the conditions that led to the injury, and—if it finds violations of OSHA standards—issues citations and proposes penalties. An inspection is the agency's full enforcement instrument, and the presence of an inspection identifier on a severe-injury report is the link from that report into OSHA's separate enforcement and inspection data. The other path is the Rapid Response Investigation (RRI). Instead of sending its own officer, OSHA directs the employer to investigate its own incident—to identify the cause, describe the hazard, and report back on what it has done or will do to prevent a recurrence. The RRI is, in effect, employer-led self-investigation under agency direction, a way to extract a corrective response from many more reports than OSHA could ever inspect directly.

The split between these two responses is one of the central facts the dataset documents, and it is genuinely consequential. The Rapid Response Investigation was a deliberate policy innovation built to make the flood of new reports manageable—an acknowledgment that a 24-hour, single-injury reporting rule would surface far more events than the inspectorate could chase. Its defenders argue it engages employers in fixing their own hazards and multiplies OSHA's reach; its critics argue that handing the investigation back to the very employer whose workplace produced the injury is a thin form of oversight, dependent on the employer's candor and effort. Whichever view one takes, the inspection and rri identifiers in the data let an analyst measure the balance directly: what share of reported severe injuries draws an actual OSHA inspection, what share is referred back to the employer, and how that mix varies by industry, by state, and by the severity or type of injury. That measurement is one of the clearest things the dataset makes possible and one of the few public windows into how a resource-constrained enforcement agency actually allocates its attention.

Coverage: the State Plan gap

Before any number drawn from this dataset can be interpreted nationally, one structural fact has to be understood: the data does not cover the whole country. The Occupational Safety and Health Act permits states to run their own occupational safety and health programs in place of federal OSHA, under federally approved State Plans. A substantial number of states operate such plans—some covering both private and public sector workers, some covering only state and local government employees—and these State Plan states administer their own reporting, enforcement, and recordkeeping.

The consequence for this dataset is direct and important. OSHA's Severe Injury Reports represent incidents under federal OSHA jurisdiction. The State Plan states run parallel reporting requirements—most adopting the same or stricter severe-injury reporting—but they have not all fed their severe-injury data into this single federal dataset, and the coverage of State Plan states in the file is incomplete and inconsistent. The practical upshot is that the dataset undercounts the nation: large, heavily industrial states that operate their own plans may be sparsely represented or absent, and a national total computed from the file is not a national total at all but a federal-jurisdiction total. Any state-by-state comparison must account for who is under federal jurisdiction and who is not, or it will mostly be measuring the map of State Plans rather than the distribution of injuries. This is the first thing to check before treating a count as comprehensive, and it is the reason the Python workflow below frames its state aggregation as federal-jurisdiction-dominated rather than as a clean national ranking.

Coverage: self-reporting and underreporting

The second coverage caveat is just as important and cuts in the same direction. The dataset is self-reported. Every row exists because an employer chose to comply with the rule and called OSHA. There is no independent sensor on the nation's workplaces; the data captures the injuries employers tell the agency about, and only those. That dependence on voluntary compliance opens a gap between the injuries that qualify under the rule and the injuries that actually get reported.

That gap is not hypothetical. OSHA's own analysis of the program found substantial underreporting: a meaningful fraction of qualifying severe injuries are never reported, whether through employer ignorance of the rule, deliberate non-compliance, or uncertainty about whether a particular injury crosses the reporting threshold. The agency has been candid that the reported counts understate the true incidence of severe injury. For an analyst, this has a precise and unforgiving implication: the dataset is a floor, not a census. Every count it yields—by industry, by body part, by state—is a lower bound on the real number, and the degree of undercount is not uniform. Industries and employers that are conscientious about compliance will be more fully represented than those that are not, which means apparent differences between industries or firms can reflect differences in reporting discipline as much as differences in actual injury rates. The absence of reports from an establishment is emphatically not evidence that the establishment is safe. Treating reported counts as if they were complete incidence figures is the single most common way to misread this data.

Joining to industry, geography, and the wider OSHA record

The severe-injury data is most powerful not in isolation but joined to the other structures its fields point to, and three joins matter most.

The first is to industry, through the NAICS code. Every report carries the employer's North American Industry Classification System code, and that code is the bridge from an individual injury to the economic structure of an entire sector. Aggregating reports by the two-digit NAICS sector immediately reveals the concentration the dataset is known for: manufacturing (NAICS 31–33), construction (23), and transportation and warehousing (48–49) dominate the amputation and hospitalization counts, because these are the sectors where workers interact with powered machinery, work at height, and handle heavy materials. Drilling into the more detailed NAICS codes localizes the hazard further—to food manufacturing, to fabricated metal products, to specific construction trades, to warehousing operations—and joining the counts to industry employment from the Bureau of Labor Statistics converts raw report counts into rates per worker, which is the only way to compare the danger of a large industry against a small one fairly.

The second join is to geography. Each report carries a state, an address, and a geocode, so the data can be mapped—subject always to the State Plan caveat that constrains what a map can honestly claim. Within federal-jurisdiction states, geographic aggregation supports the kind of place-based analysis that connects severe injuries to local industrial composition and to the distribution of OSHA's own enforcement attention. The third join is to OSHA's enforcement and inspection data through the inspection identifier. When a severe-injury report triggers an OSHA inspection, the inspection identifier links the report to the full inspection record—the violations cited, the standards involved, the penalties proposed—closing the loop from the injury that prompted the inspection to the enforcement outcome it produced. That linkage is what lets an analyst ask whether the injuries that draw inspections are the ones that yield citations, and what kinds of violations the severe-injury stream surfaces.

Analytical uses

A near-real-time, event-level, industry-and-geography-coded record of severe workplace injuries supports a distinctive set of analyses, each of which has to be conducted with the two coverage caveats firmly in mind.

Hazard profiling by industry is the most immediate use. Combining the NAICS code with the body-part, nature, and source fields produces a precise picture of how workers get hurt in a given sector—which machines amputate which fingers in food manufacturing, which falls hospitalize which workers in construction, which materials-handling tasks crush which limbs in warehousing. This is the diagnostic that points safety programs and OSHA's own emphasis programs toward specific, addressable hazards rather than generic exhortation. Trend monitoring over time exploits the dataset's near-real-time character: because reports flow in continuously and carry an incident date, the data can track whether severe injuries in a sector are rising or falling, whether a new hazard is emerging, or whether an enforcement initiative is followed by a change in reports—always remembering that a change in reported counts can reflect a change in reporting behavior as much as in true incidence.

Measuring OSHA's enforcement posture uses the inspection-versus-RRI split to ask how the agency allocates its scarce attention—what share of reports it inspects, how that share differs across industries and states, and whether the most severe injuries are the ones it chooses to investigate itself. Finally, equity and worker-protection analysis combines the data's geography and industry detail with external data on who works in the highest-hazard, lowest-reporting corners of the economy—the temporary, contract, and low-wage workers concentrated in exactly the sectors and the establishments least likely to report fully—surfacing the populations the data captures least well and protects least, which is precisely where the undercount and the harm overlap.

Python workflow: loading and aggregating the severe-injury file

The script below downloads OSHA's complete Severe Injury Reports file, resolves the (slightly version-dependent) column names defensively, and computes the core aggregations: reports by two-digit NAICS sector, the most-affected body parts, reports by state, and the critical inspection-versus-Rapid-Response-Investigation split. No API key is required; the file is a public download. Because OSHA refreshes the file periodically and ships the full dataset as a date-stamped ZIP archive, the loader sniffs the content and unwraps the zip, and the exact download URL should be confirmed against the current SIR dashboard page before a production run. Requirements: requests and pandas.

import requests, io, zipfile
import pandas as pd

# OSHA publishes the complete Severe Injury Reports file as a download on
# osha.gov, via the "Download the full SIR data set" button on the Severe
# Injury Dashboard. The dataset begins January 1, 2015, the day the 29 CFR
# 1904.39 reporting expansion took effect, and runs to the present, one row
# per reported event. The full file is distributed as a date-stamped ZIP
# (the filename embeds the through-date), so the exact URL changes between
# refreshes -- isolate it here and confirm against the current SIR dashboard.
SIR_URL = "https://www.osha.gov/sites/default/files/January2015toOctober2025.zip"


def load_sir(url=SIR_URL):
    r = requests.get(url, timeout=300)
    r.raise_for_status()
    raw = r.content
    # The full SIR download ships as a .zip; sniff the magic bytes and unwrap.
    # (If a refresh ever serves a bare CSV, fall through and read it directly.)
    if raw[:2] == b"PK":
        zf = zipfile.ZipFile(io.BytesIO(raw))
        name = next(n for n in zf.namelist() if n.lower().endswith(".csv"))
        raw = zf.read(name)
    return pd.read_csv(io.BytesIO(raw), dtype=str, low_memory=False)


# Column labels have drifted slightly across SIR releases; resolve the
# working name defensively rather than hard-coding a single spelling.
def col(frame, *candidates):
    lower = {c.lower().strip(): c for c in frame.columns}
    for cand in candidates:
        if cand.lower() in lower:
            return lower[cand.lower()]
    raise KeyError(f"none of {candidates} in {list(frame.columns)[:14]}...")


df = load_sir()
print(f"Severe-injury reports loaded: {len(df):,}")

c_naics = col(df, "NAICS", "PrimaryNAICS", "Primary NAICS")
c_state = col(df, "State", "Employer State")
c_part  = col(df, "Part of Body Title", "BodyPart", "Part of Body")
c_insp  = col(df, "Inspection", "Inspection Id", "InspectionID")
c_rri   = col(df, "RRI", "Rapid Response Investigation", "RRI Id")

# --- 1. Reports by NAICS two-digit sector ------------------------------
# The first two NAICS digits name the sector (31-33 manufacturing,
# 23 construction, 48-49 transportation/warehousing, and so on).
df["sector"] = df[c_naics].fillna("").str.replace(r"\D", "", regex=True).str[:2]
print("\nTop 12 sectors by report count:")
for sec, n in df["sector"].value_counts().head(12).items():
    print(f"  NAICS {sec or '??':<4} {n:>7,}")

# --- 2. Most-affected body parts ---------------------------------------
print("\nTop 12 affected body parts:")
for part, n in df[c_part].fillna("(unknown)").value_counts().head(12).items():
    print(f"  {part[:34]:<34} {n:>6,}")

# --- 3. Reports by state (federal-jurisdiction states dominate) --------
print("\nTop 12 states by report count:")
for st, n in df[c_state].fillna("(blank)").value_counts().head(12).items():
    print(f"  {st:<4} {n:>7,}")

# --- 4. OSHA response: inspection vs Rapid Response Investigation ------
# A non-empty inspection id means OSHA opened its own inspection; an RRI
# value means the employer was directed to investigate and report back.
opened_insp = df[c_insp].fillna("").str.strip().ne("").sum()
opened_rri  = df[c_rri].fillna("").str.strip().ne("").sum()
total = len(df)
print("\nOSHA response to reported events:")
print(f"  OSHA inspection opened:        {opened_insp:>7,} ({opened_insp/total:.1%})")
print(f"  Rapid Response Investigation:  {opened_rri:>7,} ({opened_rri/total:.1%})")

Two things about this script deserve emphasis. First, the inspection-versus-RRI calculation treats a non-empty inspection identifier as an OSHA-led investigation and a non-empty RRI identifier as an employer-led one; this is the right first pass, but the two are not always mutually exclusive across the life of a report, and a rigorous analysis should examine how the two fields co-occur and how the agency's response evolved as it gained experience with the post-2015 volume. Second, and far more important, every count this script produces—by sector, by body part, by state—must be read through the two coverage caveats. The state ranking is a ranking of federal-jurisdiction reporting, not of national injury incidence, and is heavily shaped by which states run their own plans; and every count is a floor, depressed by the documented underreporting, not a complete census. The script computes honest aggregates of what was reported; turning those into claims about what actually happened requires the interpretive discipline the next section sets out.

Limitations and analytical caveats

The Severe Injury Reports dataset is the most detailed public record of individual severe workplace injuries in the United States, but it carries structural limitations that an analyst must internalize before drawing any conclusion from it.

It covers federal jurisdiction, not the nation.Because the State Plan states administer their own programs and have not all fed this single federal dataset, the file represents incidents under federal OSHA jurisdiction and systematically undercounts the country. A national total computed from it is a federal-jurisdiction total, and any cross-state comparison that ignores which states run their own plans will largely be measuring the geography of State Plans rather than the geography of injury. This is the first and most consequential thing to control for.

It is self-reported and substantially under-reported.Every row depends on an employer choosing to comply, and OSHA's own analysis has found that a meaningful fraction of qualifying injuries never get reported. The reported counts are therefore a floor, not a census, and the undercount is uneven—more complete where compliance is conscientious, sparser where it is not. The corollary is unforgiving: the absence of reports from an industry or an establishment is not evidence of safety, and apparent differences between them can reflect reporting discipline rather than real differences in harm.

A report is an event, and the codes are summaries. The dataset records reported events, not adjudicated facts: the body-part, nature, and source codes compress the circumstances of an injury into standardized categories, and the richer texture of what happened—exactly how the incident occurred, what was done about it—lives in the underlying inspection or Rapid Response Investigation file, not in the severe-injury extract. The codes are excellent for comparison across many records and poor for reconstructing the particulars of any one. And an RRI referral, importantly, is not an enforcement finding: it records that OSHA directed the employer to investigate, not that any violation was established.

Counts need rates, and geocodes need care. Raw report counts by industry mostly track the size of the industry; comparing the danger of one sector against another requires normalizing by employment, which means joining out to labor-force data the dataset does not contain. And while the file carries latitude and longitude, OSHA cautions that geocode accuracy is not guaranteed, so fine-grained spatial analysis should treat the coordinates as approximate and fall back on the address and state fields where precision matters.

Held with these caveats in mind, the osha_severe_injuries table is a uniquely valuable resource: a near-real-time, event-level, industry- and geography-coded record of the amputations, eye losses, and hospitalizations that the 2015 reporting rule first made visible—a federal ledger of the moment American work turns dangerous, read honestly as the reported floor it is rather than the complete count it can never be.

Related writing

OSHA 300A Injury and Illness Data: The Federal Database Behind Establishment-Level Workplace Injury Rates — The establishment-level companion to the severe-injury stream: where Severe Injury Reports capture individual events in near-real time, the 300A summaries give the annual injury and illness totals and hours-worked denominators that turn severe-injury counts into rates per worker.

FMCSA Crash Data: The Federal Database Behind Large Truck and Bus Crashes — A parallel federal record of serious harm on the job, this time on the road: like the severe-injury data it is event-level, carrier-identified, and shadowed by underreporting and state-reporting variation that every analyst has to control for.

PHMSA Pipeline Safety Data: The Federal Database Behind Gas and Liquid Pipeline Incidents — Another safety-regulator incident dataset built on operator self-reporting above a defined threshold, where the same questions—what triggers a report, how the agency responds, and what the counts leave out—govern how the data can be read.