Technical writing

MSHA Mine Safety Data: Violations, Accidents, and Fatalities Across 10,000 Active Mines

· 14 min read· AI Analytics
Federal DataMSHAMine SafetyLabor

The Mine Safety and Health Administration publishes three linked public datasets that together document every active mine in the United States, every reportable accident and injury since 1983, and every citation issued to a mine operator since 1983. The datasets are free, bulk-downloadable, and updated quarterly. They are the foundation for every serious investigation of mine safety — and they were available years before the Upper Big Branch disaster killed 29 miners in 2010, recording the violations that preceded it.

This article covers what each of the three MSHA datasets contains, how they link through a common Mine ID, the “significant and substantial” violation designation and why it matters for enforcement, the Pattern of Violations mechanism and its use to force mine closure, what Upper Big Branch inspection records showed before the explosion, commodity-level breakdowns across coal and metal/nonmetal mining, how to access all three datasets via MSHA's public data tool at ard.msha.gov, a Python snippet that joins violations to accidents by Mine ID and calculates S&S violation rate as a leading indicator for fatal accidents, and how journalists use MSHA data to investigate operators with chronic violations before major accidents.

The three datasets and how they link

MSHA's public data is organized around three tables that share a common key: the Mine ID, a unique numeric identifier assigned to every mining operation that has ever received an MSHA mine identification number. The Mine ID is permanent — it does not change when a mine changes ownership, changes name, or temporarily suspends operations. This persistence makes Mine ID an unusually reliable join key for longitudinal analysis.

The Mine dataset is the master register. Each row is one mining operation and contains: Mine ID, mine name, operator name and address, commodity code (the principal mineral extracted), mine type (surface or underground), mine status (active, abandoned, temporarily idle, or permanently closed), state, county, MSHA coal or metal/nonmetal district, and employment figures — average number of employees and employee-hours worked in the most recent reporting period. The employment data is self-reported by operators on the MSHA 7000-2 form and is used to compute injury rate denominators. As of 2025, approximately 10,000 mines carry active or temporarily idle status.

The Accident/Injury/Illness dataset contains every reportable accident, injury, and occupational illness since 1983. Each row is one reportable event and carries: Mine ID, accident date, accident type (fall of roof, machinery, explosives, electrical, and roughly 30 additional categories), degree of injury (fatality, days away from work, restricted duty, medical treatment only, first aid), number of injuries, days lost from work, days of restricted duty, the injured worker's occupation, the injured worker's experience at the mine, and a narrative description of the accident written by the MSHA inspector who investigated it. The narrative field is the most analytically rich: it describes in plain language what happened, what equipment was involved, and what the operator did or failed to do. It can be queried with keyword search and processed with natural language tools to extract structured facts about accident causation.

The Violations dataset contains every citation issued by an MSHA inspector since 1983. Each row is one citation and carries: Mine ID, violation number, the date the citation was issued, the section of the Federal Mine Safety and Health Act or a specific MSHA safety standard cited as the legal basis for the violation, the “significant and substantial” designation flag, the proposed citation penalty, the final assessed penalty after any contest or settlement, a flag indicating whether the operator contested the citation, and the abatement date by which the hazardous condition must be corrected. The section of law field is the key to hazard-specific analysis: you can compute which legal provisions are most frequently violated, which are most often designated as S&S, and which carry the highest final penalties.

The significant and substantial designation

The most important single field in the Violations dataset is the S&S flag. A violation is designated “significant and substantial” when an MSHA inspector determines that the violation is reasonably likely to result in an injury or illness of a reasonably serious nature. The four-part test for an S&S designation, established by the Federal Mine Safety and Health Review Commission in the 1980 Mathies Coal decision, requires that the violation be of a mandatory safety standard, that there be a discrete safety hazard created by the violation, that the hazard be reasonably likely to cause an injury, and that the potential injury be reasonably serious.

S&S violations carry elevated consequences on two dimensions. First, they attract higher civil penalty assessments than equivalent non-S&S violations under MSHA's penalty point system. Second, and more significantly for enforcement purposes, S&S violations are the unit of measurement for the Pattern of Violations mechanism. Only S&S violations count toward the POV threshold. A mine that accumulates many minor violations will never trigger POV status; a mine that accumulates S&S violations at a high rate can be placed on POV notice regardless of whether it has had a recent accident.

In the Violations dataset, S&S rates vary dramatically across commodity and mine type. Underground coal mines — the highest-hazard category in American mining — carry the highest S&S rates, typically in the range of 40 to 60 percent of all violations in active mines. Surface coal mines carry lower S&S rates, and metal and nonmetal mines lower still, though the distribution varies significantly by specific commodity and geography. A mine's S&S rate over time is one of the strongest quantitative leading indicators of subsequent fatal accident risk available in any federal dataset.

The Pattern of Violations enforcement mechanism

Section 104(e) of the Federal Mine Safety and Health Act authorizes MSHA to initiate Pattern of Violations proceedings against a mine operator whose mine has a history demonstrating a recurring pattern of S&S violations. POV is the most severe enforcement tool available to MSHA short of criminal prosecution. A mine that receives a POV notice under Section 104(e) is subject to immediate withdrawal orders for any subsequent S&S violation found during inspections — meaning miners must evacuate the affected area until the violation is abated, rather than simply receiving a citation with a penalty and an abatement deadline.

POV status is effective at forcing rapid abatement because the economic cost of a withdrawal order to an operating mine is immediate and severe: a productive coal mine pulled from production while a withdrawal order is active loses revenue by the hour. The mechanism was deliberately designed to make continued operation more expensive than compliance.

The criteria for POV notification were revised after the Upper Big Branch disaster revealed that no mine had been placed on POV status under the prior criteria despite chronic S&S violations at many operations. MSHA published revised POV criteria in 2013 using a statistical screening model that compares a mine's S&S violation rate, normalized for inspection hours, against the distribution of similar mines by commodity and mine type. Mines whose S&S rates fall in the highest statistical tier receive additional scrutiny before POV notification. POV notification letters, withdrawal orders issued under POV status, and the list of mines that have received POV notifications are all public records accessible through MSHA's enforcement data portal.

Upper Big Branch: what the inspection record showed

On April 5, 2010, an explosion in the Upper Big Branch coal mine in Raleigh County, West Virginia, killed 29 miners — the largest coal mine disaster in the United States in 40 years, and the worst mining disaster of any kind since the 1972 Sunshine Mine fire in Idaho. The explosion was caused by a methane ignition that propagated through an accumulation of coal dust that had not been adequately treated with rock dust to suppress explosive propagation. Both the methane accumulation and the inadequate rock dusting violated MSHA mandatory safety standards.

The mine's MSHA inspection record, available in the public Violations dataset under Mine ID 4601796, documented years of chronic violations before the explosion. In the 18 months preceding the disaster, Upper Big Branch received more than 500 citations from MSHA inspectors. A significant fraction were S&S designations. Violations included repeated citations for inadequate rock dusting, ventilation deficiencies, and accumulations of combustible material — the precise conditions that caused the explosion. The operator, Massey Energy, contested a large share of its citations, a tactic that delayed final penalty assessment and, under the pre-2013 POV criteria, prevented the accumulation of enough uncontested S&S violations to trigger POV proceedings.

The independent investigation by the Mine Safety and Health Administration, released in 2011, found that Upper Big Branch's violation record was not exceptional among the highest-hazard underground coal mines of that era. Multiple mines operated by large coal companies had comparable S&S rates and comparable patterns of contesting citations to delay penalty finalization. The inspection record did not fail to capture the hazard; the enforcement mechanism lacked the tools to translate a documented hazard record into regulatory intervention before a catastrophic event.

Post-2010 reforms included the revised POV criteria, MSHA's increased focus on pattern screening, and stepped-up inspection frequency at underground coal mines with high S&S rates. Fatal coal mining accidents in the United States declined from 48 in 2010 to fewer than 10 in most years since 2015. The Violations dataset records this trajectory in granular form: S&S rates at the mines that were most cited in 2007–2010 either declined or those mines closed as coal production shifted.

Commodity breakdown: coal, surface, metal and nonmetal

MSHA's jurisdiction covers all extractive industries, not only coal. The Mine dataset commodity field divides the roughly 10,000 active mines into three broad regulatory categories with distinct hazard profiles, inspection frequencies, and violation patterns.

Underground coal mines are the most intensively regulated category. MSHA is required by statute to inspect every underground coal mine at least four times per year. The primary hazards — methane accumulation, coal dust explosibility, roof fall, and haulage accidents — are the subject of a dense body of mandatory safety standards under 30 CFR Parts 70–90. Underground coal mines account for a disproportionate share of fatal accidents relative to their share of employment: underground coal mining has historically carried a fatal injury rate three to five times the rate of surface mining in the same commodity.

Surface coal mines are required by statute to receive at least two MSHA inspections per year. They operate at lower methane and explosion risk than underground mines but carry significant haulage, machinery, and highwall hazards. The haulage accident category — haul trucks, front-end loaders, belt conveyors — is the largest single cause of serious injury at surface mines. Surface coal mining's fatal injury rate, while lower than underground coal, remains significantly higher than general industry.

Metal and nonmetal mines span a wide range of commodities — gold, copper, iron ore, stone, sand, gravel, limestone, potash, and dozens of others — and are subject to at least two inspections per year. Aggregate mines (stone, sand, and gravel) are the most numerous category within metal/nonmetal by count; hard-rock metal mines carry the most severe individual accident risk. The persistent challenge in metal/nonmetal is that the sector has not seen the post-2010 fatality decline that characterized coal. Metal/nonmetal fatalities fluctuate year to year but have not trended downward at the same rate, partly because the sector is more fragmented among small operators who receive less frequent scrutiny, and partly because the specific hazards — ground control failure in hard-rock mines, vehicle accidents at aggregate operations — require site-specific engineering controls that are harder to standardize than the rock dusting and ventilation requirements that drive coal enforcement.

How to access the data

All three datasets are available at MSHA's public data tool at ard.msha.gov (the Accident, Respirable Dust, and other data portal, now expanded to cover the full enforcement dataset). The interface allows you to select a dataset, apply date range and state filters, and export to CSV. Exports are not paginated — the full dataset is downloaded in a single file. As of 2025, the full Violations CSV covers more than four million citation records going back to 1983 and is several hundred megabytes uncompressed. The Accidents CSV covers approximately 800,000 reportable events over the same period. Both files are manageable in pandas on a standard laptop with appropriate dtype specification to avoid memory inflation from object columns.

The MSHA data portal also provides separate downloads for inspection records, employment and production data, dust sample data, and noise sample data. The inspection table links to violations by Mine ID and inspection number; joining inspections to violations reveals the inspection context for each citation — whether it was a regular inspection, a spot inspection, or a fatality investigation. MSHA inspection data is updated on a quarterly cycle. Researchers who require more current data for a specific mine can submit a FOIA request to MSHA for the inspection notes and citation details for a specific Mine ID and date range; MSHA has a public FOIA reading room at msha.gov that includes frequently requested documents.

Python: joining violations to accidents and computing S&S rate

The following script loads all three MSHA datasets, computes a per-mine S&S violation rate from the Violations table, identifies fatal accidents from the Accidents table, and joins both to the Mine master table via Mine ID. The resulting dataframe surfaces mines with both a high S&S rate and at least one fatal accident — the combination that historically precedes Pattern of Violations scrutiny.

import pandas as pd

# Download all three datasets from MSHA's public data tool:
#   https://ard.msha.gov/
# Select "Mines", "Accidents/Injuries/Illnesses", and "Violations"
# and export as CSV. The files are updated quarterly.

mines = pd.read_csv(
    "Mines.csv",
    dtype={"MINE_ID": str, "COMMODITY": str, "MINE_TYPE": str,
           "MINE_STATUS": str, "STATE": str, "COUNTY": str},
    low_memory=False,
)

violations = pd.read_csv(
    "Violations.csv",
    dtype={"MINE_ID": str, "VIOLATION_NO": str, "SECTION_OF_ACT": str,
           "SS": str, "CITATION_PENALTY": float, "FINAL_PENALTY": float,
           "CONTEST_IND": str},
    low_memory=False,
    parse_dates=["ISSUE_DT"],
)

accidents = pd.read_csv(
    "Accidents.csv",
    dtype={"MINE_ID": str, "ACCIDENT_TYPE": str, "DEGREE_INJURY": str,
           "DAYS_LOST": float, "NO_INJURIES": float},
    low_memory=False,
    parse_dates=["ACCIDENT_DT"],
)

# --- S&S violation rate per mine ---
# "SS" field is "Y" when the violation is Significant and Substantial
violations["is_ss"] = violations["SS"].str.upper() == "Y"
ss_rate = (
    violations.groupby("MINE_ID")
    .agg(
        total_violations=("VIOLATION_NO", "count"),
        ss_violations=("is_ss", "sum"),
    )
    .assign(ss_rate=lambda d: d["ss_violations"] / d["total_violations"])
    .reset_index()
)

# --- Fatal accidents per mine ---
# DEGREE_INJURY code "01" = fatality in the MSHA schema
fatal = (
    accidents[accidents["DEGREE_INJURY"] == "01"]
    .groupby("MINE_ID")
    .size()
    .rename("fatal_accidents")
    .reset_index()
)

# --- Join violations -> accidents via Mine ID, keep mine metadata ---
analysis = (
    mines[["MINE_ID", "MINE_NAME", "COMMODITY", "MINE_TYPE", "STATE"]]
    .merge(ss_rate, on="MINE_ID", how="left")
    .merge(fatal, on="MINE_ID", how="left")
    .fillna({"total_violations": 0, "ss_violations": 0,
             "ss_rate": 0.0, "fatal_accidents": 0})
)

# Mines with high S&S rates and at least one fatality
high_risk = analysis[
    (analysis["ss_rate"] >= 0.5) & (analysis["fatal_accidents"] >= 1)
].sort_values("ss_rate", ascending=False)

print(high_risk[["MINE_NAME", "COMMODITY", "STATE",
                  "total_violations", "ss_rate", "fatal_accidents"]].head(20))

The S&S rate computed here is a simple proportion: S&S violations as a share of all violations at the mine over the full history of the record. A more refined leading indicator adjusts for inspection intensity by normalizing S&S violation count against MSHA inspector-hours at the mine (available in the inspections table as a separate field). Mines that accumulate a large number of S&S violations partly because they receive more frequent inspections will have their rate inflated by inspection frequency rather than genuine compliance failure; the normalized rate corrects for this. MSHA's own Pattern of Violations screening model uses the inspection-hour normalized rate for exactly this reason.

The post-2010 decline in coal fatalities vs. persistent metal/nonmetal risk

The trajectory of coal mining fatalities after 2010 is one of the clearer stories in the MSHA Accidents dataset. In 2010, the year of Upper Big Branch, 48 coal miners died in U.S. mining accidents. By 2017, the annual coal fatality count had fallen to 8. In most years between 2015 and 2024, fewer coal miners died in all of U.S. mining than died at Upper Big Branch alone. The Accidents dataset records this decline across all accident types: fatal roof falls at underground coal mines declined, fatal haulage accidents at surface coal mines declined, and fatal machinery accidents at coal preparation plants declined.

The drivers of the decline are documented in adjacent datasets. MSHA inspection frequency at underground coal mines increased after 2010, as did the share of inspections conducted as spot or special inspections rather than routine quarterly inspections. The Violations dataset shows a post-2010 increase in citations issued per inspection-hour at underground coal mines, consistent with more thorough inspections finding more violations, and a subsequent decline in S&S rates at the mines with the worst pre-2010 violation records — either because enforcement pressure produced compliance or because the highest-risk mines closed as coal production declined in Appalachia.

Metal and nonmetal mining has not experienced a comparable post-2010 decline. Fatal metal/nonmetal accidents fluctuate between roughly 20 and 40 per year with no clear downward trend visible in the Accidents dataset over the same period. The Violations dataset reveals a structural difference: the S&S violation rate at metal/nonmetal mines is lower in absolute terms than at underground coal mines, and POV proceedings against metal/nonmetal operators are rare. This may reflect genuinely lower hazard concentration in the more heterogeneous metal/nonmetal category, or it may reflect that mandatory inspection frequencies are lower for metal/nonmetal mines and therefore fewer violations are detected per mine-year. Aggregate operations — the most numerous metal/nonmetal category — are among the most lightly inspected, and haulage accidents at aggregate quarries and pits are the largest single fatal accident category in the metal/nonmetal sector.

Journalist and investigative applications

Investigative journalists have used MSHA data consistently since the public datasets became available in their current form. The workflow is almost always the same: identify a mine where a serious accident has occurred or is being investigated, retrieve its full violation history from the Violations dataset by Mine ID, and compute what the inspection record showed in the months and years before the accident.

The most powerful version of this analysis does not start with the accident — it starts with the violation record. A reporter or analyst who computes S&S violation rates across all active underground coal mines, ranks them, and identifies the top decile has a list of mines where a serious accident is statistically more likely than at comparable mines. That list, cross-referenced with ownership records (available in the Mine dataset's operator name field, though operator name normalization requires the same entity resolution work as any establishment-level analysis), identifies corporate operators who control multiple mines in the high-risk tier. This is the pattern that preceded both Upper Big Branch and the 2006 Sago mine disaster, which killed 12 miners in West Virginia.

The contested citation field in the Violations dataset is a secondary investigative signal. Operators who contest a high share of their citations — particularly their S&S-designated citations — are delaying the finalization of penalties and, as noted above, historically delaying the accumulation of final S&S violation counts toward POV status. The combination of a high S&S rate and a high contest rate is the signature of an operator who is both operating in violation of safety standards and using the legal process to minimize the regulatory consequences. Massey Energy's contest strategy at Upper Big Branch was documented in the post-disaster investigation and is directly visible in the public Violations dataset for that mine.

The penalty gap analysis available in the Violations dataset — comparing proposed citation penalties to final assessed penalties after contest and settlement — adds a financial dimension. Operators who successfully reduce their penalty burden through contest proceedings may face a lower effective cost of noncompliance than the nominal penalty schedule suggests. A mine with a proposed penalty of $200,000 across 40 S&S citations that settles for $60,000 after contest is paying roughly $1,500 per S&S violation — far below the deterrent level MSHA's penalty schedule was designed to produce. This penalty gap analysis is reproducible from the public Violations dataset using the same proposed-to-final penalty comparison available in the OSHA enforcement database.

Finally, the MSHA data becomes significantly more powerful when joined to external datasets. Mine ownership records from SEC filings and state corporate registrations enable enterprise-level violation analysis when the operator name field in the Mine dataset is insufficient for entity resolution. Production data from the Energy Information Administration's coal production reports, joinable by mine, enables injury rate normalization by production volume rather than employee-hours — useful because employee-hours can be manipulated through contractor reclassification that moves injury exposures off an operator's MSHA record. The resulting multi-dataset view is the most complete picture of mine safety risk available from public records.

Related writing

Workplace safety violations: using OSHA inspection and citation data to find dangerous employers — OSHA publishes its full inspection and citation database covering 2.5M+ inspections since 1972 — here is how to query it and what patterns emerge from fifty years of enforcement data.

Wage theft at scale: using DOL Wage and Hour enforcement data to find FLSA violators — How to acquire and analyze the DOL Wage and Hour Division's enforcement database to identify employers with repeat minimum wage, overtime, and child labor violations.

EPA enforcement data: penalties, inspections, and compliance across every regulated facility — The EPA ECHO database covers air, water, and hazardous waste enforcement at hundreds of thousands of facilities — how to access it and what it reveals about regulatory compliance.