Technical writing
PHMSA Pipeline Safety Data: The Federal Database Behind Gas and Liquid Pipeline Incidents
The Pipeline and Hazardous Materials Safety Administration maintains incident reports for every significant gas and liquid pipeline accident in the United States — spills, explosions, injuries, fatalities, and property damage — creating the most comprehensive public record of pipeline safety performance across 2.7 million miles of US pipeline infrastructure.
What PHMSA Is
The Pipeline and Hazardous Materials Safety Administration is a modal safety agency within the Department of Transportation, created in 2004 by the Norman Y. Mineta Research and Special Programs Improvement Act. The legislation consolidated two predecessor offices — the Research and Special Programs Administration and the DOT's pipeline safety functions — into a dedicated agency with a single safety mission. PHMSA has approximately 500 full-time employees and an annual budget of roughly $170 million, divided between its pipeline safety programs and its hazardous materials transportation programs.
The agency's Office of Pipeline Safety (OPS) administers federal pipeline safety regulations under 49 U.S.C. Chapter 601 and the associated implementing regulations in 49 CFR Parts 190–199. OPS sets minimum federal safety standards for the design, construction, testing, operation, and maintenance of pipeline facilities. The regulatory framework covers four system categories: gas transmission and gathering pipelines, gas distribution pipelines, hazardous liquid pipelines, and liquefied natural gas (LNG) facilities. Each category carries its own subpart of the CFR and its own incident reporting form within the PHMSA OPSWEB reporting system.
PHMSA does not directly inspect every mile of pipeline in the country. Approximately 50 state pipeline safety programs operate under certification agreements with PHMSA, pursuant to 49 U.S.C. § 60105, and carry out inspections on behalf of the federal government for intrastate pipeline facilities. State programs cover roughly 60 percent of gas distribution pipeline mileage in the United States. PHMSA retains exclusive jurisdiction over interstate transmission pipelines and conducts federal inspections of those systems directly. The state–federal partnership means that the enforcement record visible in PHMSA's public data is the federal portion of a larger regulatory apparatus; state enforcement actions are documented separately in each state's program records.
The Incident Database
PHMSA's incident reporting system is built around the concept of a “significant incident” — an accident meeting at least one of a defined set of consequence thresholds that triggers a mandatory written report from the operating company. The thresholds were designed to capture events of public safety and environmental significance while avoiding the administrative burden of reporting every minor release from a network of 2.7 million miles of pipe.
An operator is required to file an incident report for any of the following: a fatality attributable to the pipeline system; a personal injury requiring in-patient hospitalization; estimated property damage of $50,000 or more in original cost (not replacement cost); an unintentional estimated gas release of three million or more cubic feet; a release of five or more barrels of hazardous liquid that reaches a body of water; an unintentional fire or explosion; or any release that results in an emergency shutdown or evacuation of a building. Events below all of these thresholds are classified as reportable events rather than significant incidents and are collected in a separate lower-tier reporting system.
The reporting forms differ by system type. Gas transmission and gathering operators use Form 7100.2, which captures pipeline operating pressure, pipe vintage, seam type, and prior in-line inspection results for the failed segment. Gas distribution operators use Form 7100.1, which emphasizes excavation depth, locating records, and proximity to buildings. Hazardous liquid operators use Form 7000.1, which includes commodity-specific fields for barrels released, barrels recovered, and net environmental loss. LNG facility operators file on Form 7100.3. All four forms feed into the OPSWEB system and are published to the PHMSA public data portal after processing.
Operators are required to provide telephonic notification to the National Response Center within two hours of discovery of any incident involving a fatality, injury, fire, explosion, or significant release. The written PHMSA report must follow within 30 days. A supplemental report amending the initial filing may be submitted after the internal investigation concludes — and the PHMSA database retains both the original and amended records, with the current-cause field reflecting the most recent determination. For complex incidents, the final cause classification may not appear in the database for 12 to 24 months after the accident date.
Scale of US Pipeline Infrastructure
The United States operates the largest pipeline network in the world. Total pipeline mileage under PHMSA jurisdiction is approximately 2.7 million miles, not including the roughly 300,000 additional miles of service lines connecting distribution mains to individual premises. The infrastructure divides into three major categories by function and regulatory regime.
Gas transmission and gathering comprises approximately 500,000 miles. Transmission pipelines move large volumes of natural gas at high pressure (up to 1,500 psi) from production areas and processing plants to city gate stations where pressure is reduced for local distribution. Gathering pipelines collect gas from wellheads and move it to processing facilities; they operate at lower pressures but traverse remote areas with less frequent monitoring. Gas transmission is the system category with the highest potential for catastrophic incidents — a rupture in a large-diameter high-pressure line can release hundreds of millions of cubic feet of gas in minutes, creating explosion and fire hazards across a wide area. PHMSA's integrity management regulations for gas transmission pipelines in high-consequence areas (HCAs) — areas within a potential impact radius of a populated place or designated sensitive area — are the regulatory core of the transmission safety framework.
Gas distribution accounts for roughly 2.2 million miles, by far the largest share of the US pipeline network. These are the low-pressure systems running under city streets and residential neighborhoods, delivering gas from transmission interconnects to homes, businesses, and industrial customers. Distribution pipelines include a significant legacy stock of cast-iron mains (installed in the 19th and early 20th centuries and subject to mechanical joint leakage) and bare-steel mains without cathodic protection. Gas distribution incidents are dominated by third-party excavation damage — contractors and property owners striking buried pipe without using the 811 call-before-you-dig locating system — and by corrosion failures in aging infrastructure.
Hazardous liquid pipelines total approximately 200,000 miles. This system moves crude oil from production fields and import terminals to refineries; refined petroleum products (gasoline, diesel, jet fuel, and heating oil) from refineries to terminals and distribution points; highly volatile liquids including propane, ethane, and natural gas liquids; and carbon dioxide for industrial use and enhanced oil recovery. Hazardous liquid incidents generate the largest environmental remediation costs in the PHMSA database because liquid spills spread into soil and groundwater in ways that gaseous releases do not. The Enbridge Line 6B spill near Marshall, Michigan — 843,000 gallons of diluted bitumen into the Kalamazoo River — remains the largest inland oil spill in US history and required more than a billion dollars in cleanup costs.
Historical Incidents
Several incidents in the PHMSA database are significant enough to have shaped federal pipeline safety legislation and regulatory priorities. They illustrate the range of failure modes, consequence levels, and accountability mechanisms that the database documents.
San Bruno, California — PG&E gas explosion (2010). On September 9, 2010, a 30-inch natural gas transmission line operated by Pacific Gas and Electric ruptured in a San Bruno residential neighborhood. Eight people were killed, 58 were injured, and 38 homes were destroyed. The fire burned for more than two hours before crews could locate and close the relevant valves. The NTSB investigation concluded that the rupture was caused by a defective seam weld in a section of pipe manufactured in 1956 — a defect that PG&E's integrity management program had failed to detect during prior hydrostatic testing because the test pressure was insufficient to reveal the flaw. PG&E was ultimately convicted on six federal felony counts for record falsification and obstruction. Total financial consequences exceeded $1.6 billion, including a $1.35 billion settlement with the California Public Utilities Commission and a $400 million federal criminal fine. The incident directly prompted the Pipeline Safety, Regulatory Certainty, and Job Creation Act of 2011, which raised PHMSA's maximum civil penalty authority to $200,000 per violation per day.
Shelby County, Alabama — Colonial Pipeline refined products spill (2016). In September 2016, a Colonial Pipeline Company hazardous liquid line carrying gasoline ruptured during a third-party maintenance operation in Shelby County, Alabama, releasing approximately 350,000 gallons of refined product. Colonial Pipeline transports roughly 40 percent of the refined petroleum products consumed on the US East Coast, and the rupture caused temporary fuel supply disruptions and retail price increases across multiple southeastern states. PHMSA issued a corrective action order requiring pressure reductions and accelerated in-line inspection of the affected segment. The incident is recorded in the PHMSA hazardous liquid database with operator-caused excavation damage as the primary cause code.
Marshall, Michigan — Enbridge Line 6B diluted bitumen spill (2010). On July 25, 2010, an Enbridge Energy Partners pipeline known as Line 6B ruptured near Marshall, Michigan, releasing approximately 843,000 gallons of diluted bitumen into Talmadge Creek, a tributary of the Kalamazoo River. The NTSB investigation — one of the most detailed pipeline accident investigations in the board's history — found that Enbridge had identified corrosion anomalies in the line during prior in-line inspections and had failed to assess and remediate them appropriately. The Kalamazoo River spill became the largest and most costly inland oil spill in US history, with remediation costs ultimately exceeding $1.2 billion and extending over multiple years. The PHMSA hazardous liquid database records the incident under external corrosion as the primary cause. The NTSB issued its final report in 2012, and the full accountability record requires cross-referencing the NTSB findings, PHMSA consent agreement, and EPA enforcement action.
Porter Ranch, California — Aliso Canyon SoCalGas blowout (2015). In October 2015, a Southern California Gas Company storage well at the Aliso Canyon underground natural gas storage facility experienced an uncontrolled blowout that released methane continuously for nearly four months before being sealed in February 2016. The California Air Resources Board estimated total methane released at approximately 97,000 metric tons — the largest methane release from a single event in US history and roughly equivalent to the annual greenhouse gas emissions of 572,000 passenger vehicles. The blowout triggered the evacuation of approximately 2,000 households in the Porter Ranch community due to mercaptan odorant causing health complaints. PHMSA's jurisdiction over underground natural gas storage facilities was expanded by the PIPES Act of 2016, enacted specifically in response to the Aliso Canyon incident. The event is documented in the gas transmission database and is the foundational case for federal storage well safety regulations under 49 CFR Part 192 Subpart Y.
Enforcement
PHMSA's enforcement authority rests on 49 U.S.C. § 60122, which authorizes civil penalties against pipeline operators who violate federal safety standards, fail to comply with corrective action orders, or obstruct PHMSA inspections. The maximum civil penalty was raised to $200,000 per violation per day (with a $2 million cap per incident) under the Pipeline Safety, Regulatory Certainty, and Job Creation Act of 2011. Annual inflation adjustments under the Federal Civil Penalties Inflation Adjustment Act Improvements Act of 2015 have brought the per-violation maximum to approximately $266,015 per day as of recent years. PHMSA may also assess penalties for a series of related violations as a single incident subject to the per-incident cap, which frequently limits enforcement exposure for large-scale compliance failures.
The primary enforcement tools beyond civil penalties are corrective action orders (CAOs) and safety orders. A corrective action order is issued under 49 U.S.C. § 60112 when PHMSA determines that a pipeline facility poses a hazard to life, property, or the environment requiring immediate corrective action. CAOs can require pressure reductions, operational restrictions, accelerated inspections, or complete shutdown of a pipeline segment. They take effect immediately upon issuance without prior notice or opportunity to be heard, though operators may request a hearing within ten days to contest the order. Safety orders, authorized by the PIPES Act of 2006, allow PHMSA to require specific safety improvements without the immediate-hazard finding required for a CAO; they are used for systemic compliance problems that do not present an acute emergency.
For incidents involving willful violations that result in death or serious bodily harm, PHMSA may refer cases to the Department of Justice for criminal prosecution under 49 U.S.C. § 60123. Criminal liability for pipeline safety violations is a felony carrying up to five years imprisonment — a significantly higher threshold than the analogous provision in OSHA's statute. PG&E's conviction following the San Bruno explosion involved six felony counts under federal pipeline safety and obstruction statutes.
The Multimodal Accountability and Safety Improvements Act (MAP-21) of 2012 and subsequent PIPES Acts have progressively expanded PHMSA's authority. The PIPES Act of 2016 extended federal jurisdiction to cover underground natural gas storage facilities following Aliso Canyon, required PHMSA to mandate the use of automatic or remote-controlled shutoff valves on new transmission pipelines, and directed the agency to complete a rulemaking on gas gathering lines — a category that had previously operated with limited federal oversight despite significant mileage in densely drilled production areas. The PIPES Act of 2020 further expanded safety requirements for gas gathering lines in Class 2 and higher locations and required integrity assessments for certain distribution pipeline segments.
Data Access
PHMSA publishes its incident and infrastructure data through several access channels with different trade-offs between completeness and ease of use.
The primary public portal is at phmsa.dot.gov/data-and-statistics. The data and statistics landing page provides access to the pipeline incident flagged files (bulk CSV downloads for each of the four incident databases), the pipeline annual report data (mileage, infrastructure counts, and safety feature inventories by operator and state), and the enforcement case records documenting proposed and assessed penalties for completed enforcement actions. Incident CSV files are divided into historical archives and recent-years files to limit individual download sizes; the recent-years files cover approximately the last five calendar years and are updated annually.
OPSWEB is PHMSA's web-based operator reporting system and the intake point for all incident, annual report, and drug and alcohol testing submissions from regulated operators. The public-facing portion of OPSWEB at opsweb.phmsa.dot.gov allows filtered searches of the incident database by operator, state, date range, cause code, and commodity, with single-record drill-down and CSV export. It is the most user-friendly entry point for targeted queries but is not well-suited to bulk or programmatic access.
The National Pipeline Mapping System (NPMS) at npms.phmsa.dot.gov is a GIS platform publishing pipeline location data for transmission and hazardous liquid pipelines, as well as LNG facility locations. NPMS data is published as shapefile downloads by state and as an interactive map. Pipeline operators submit location data to NPMS under 49 CFR § 191.29, though gathering lines are subject to less complete location data requirements. Joining NPMS pipeline location data against PHMSA incident records on operator ID and state produces a spatial dataset of incidents by pipeline corridor — the foundation for pipeline risk density analysis and HCA mapping.
A REST API for PHMSA incident data is accessible at https://api.phmsa.dot.gov/api/incidents. The API accepts parameters for system type (gas_distribution, gas_transmission,hazardous_liquid, lng), date range, state, operator ID, and primary cause code, returning JSON with the same fields as the bulk CSV files. No authentication is required for public data endpoints. The API is useful for narrowly scoped queries — pulling all hazardous liquid incidents in Texas from a specific operator over a defined date range — without downloading the full multi-megabyte database files.
Database Structure and Key Fields
Each incident record in the PHMSA database contains several hundred fields covering the operational context of the pipeline segment, the circumstances of the incident, consequences, and operator response. The fields most valuable for systematic analysis across the full dataset include the following.
Operator name and operator ID. The operator ID is PHMSA's assigned identifier for each regulated entity and is the most reliable join key across the incident, annual report, mileage, and enforcement datasets. Operator names change across the decades-long historical record due to mergers, acquisitions, and rebranding; joining on operator ID rather than name is necessary for longitudinal analysis. For operators that have changed corporate structure significantly — Enbridge acquired the former Lakehead Pipe Line Partners system, for example — entity resolution against FERC corporate records is recommended before aggregating incident counts across the full history.
Accident date and report date. The date the incident occurred and the date the operator filed the PHMSA written report. The 30-day filing window means that report dates lag accident dates by up to a month. The gap between accident date and report date is analytically informative: operators who consistently file near the end of the 30-day window for incidents involving significant property damage are worth flagging for further review, as this pattern may indicate deliberate delay in formalizing the incident record.
Primary cause and cause subcodes. PHMSA classifies the primary cause of each incident using a hierarchical taxonomy with seven top-level categories — corrosion, excavation damage, incorrect operation, material or weld failure, natural forces, other outside force, and all other causes — each with multiple subcodes. The primary cause field is the key variable for failure pattern analysis. Cause codes may be updated in amended reports after the investigation concludes; always use the most recently amended record.
Commodity transported. Natural gas, crude oil, refined products (gasoline, diesel, aviation fuel, heating oil), highly volatile liquids (propane, ethane, butane, natural gas liquids), carbon dioxide, or other. The commodity field determines the environmental risk profile and the appropriate comparison population for benchmarking. Crude oil and diluted bitumen spills present fundamentally different remediation challenges from refined product spills; grouping them without distinguishing commodity introduces analytical noise.
Volume released, recovered, and net loss.For hazardous liquid incidents, PHMSA captures total barrels released, barrels recovered through spill response operations, and net barrels lost to the environment. The net loss figure is the environmentally significant number; total released is misleading for spills where significant recovery occurred. For gas incidents, the equivalent field records total cubic feet released. Note that volume figures are operator-reported estimates and can carry substantial uncertainty for large spills where the release continued for hours before shutoff.
Fatalities, injuries, and property damage.Counts of deaths and hospitalizations attributable to the incident, plus an estimated property damage figure in original cost dollars (not replacement cost, and not including environmental remediation, which is captured in a separate field in more recent database versions). Property damage figures are notoriously underestimated in initial filings and frequently revised upward in amended reports. The $50,000 property damage threshold that triggers reporting is also in original cost, meaning a 1986-era pipe failure valued at $50,000 in 1986 would not appear in the database if filed today at the same incident cost. Inflation adjustment of property damage figures is required for longitudinal cost comparisons.
Pipeline characteristics. Pipe diameter, material (steel, plastic, cast iron, wrought iron, other), wall thickness, installation year, seam type (seamless, electric resistance welded, electric fusion welded, lap welded, spiral welded), and operating pressure at the time of the incident. These fields enable analysis of failure rates by pipe vintage and technology — the key dimension for evaluating infrastructure replacement prioritization programs. Installation year is frequently missing or estimated for vintage pipe installed before systematic record-keeping; records showing installation year as “unknown” are common for pre-1940 distribution infrastructure.
Python: Downloading and Analyzing Hazardous Liquid Incidents
The following script downloads the PHMSA hazardous liquid significant incidents CSV, normalizes column names, filters to 2019–2023, and computes two outputs: the top ten operators by total net liquid lost in barrels over that period, and annual fatality and injury totals for hazardous liquid incidents. Both calculations use column-name detection to handle the minor field-name variation across different PHMSA CSV releases.
import requests, pandas as pd, io
# ---------------------------------------------------------------------------
# PHMSA Hazardous Liquid Significant Incidents Analysis
#
# Primary data source:
# PHMSA bulk CSV downloads at:
# https://www.phmsa.dot.gov/data-and-statistics/pipeline/pipeline-incident-flagged-files
#
# Separate files for each system type:
# hl_incident_up_to_5yrs.csv - hazardous liquid, last 5 years
# hl_incident_since_2010.csv - hazardous liquid, 2010 to present
# gas_transmission_incident_up_to_5yrs.csv
# gas_distribution_incident_up_to_5yrs.csv
#
# All files are updated annually. Column naming varies slightly across
# system types; normalize before combining.
#
# Reporting threshold for significant incidents:
# - Fatality or injury requiring hospitalization
# - Property damage >= $50,000 (original cost)
# - Unintentional fire or explosion
# - Hazardous liquid release >= 5 barrels to water body
# - Gas release >= 3 million cubic feet
# - Any release requiring evacuation
# ---------------------------------------------------------------------------
# PHMSA hazardous liquid significant incidents (CSV download)
url = "https://www.phmsa.dot.gov/sites/phmsa.dot.gov/files/data_statistics/pipeline/hl_incident_up_to_5yrs.csv"
resp = requests.get(url, timeout=30)
df = pd.read_csv(io.StringIO(resp.text), encoding='latin-1', low_memory=False)
# Standardize column names
df.columns = [c.strip().lower().replace(' ', '_') for c in df.columns]
# Filter: year 2019-2023
df['year'] = pd.to_numeric(df.get('iyear', df.get('accident_year', 0)), errors='coerce')
recent = df[df['year'].between(2019, 2023)].copy()
# Top operators by total net loss (barrels)
loss_col = [c for c in df.columns if 'net_loss' in c or 'volume' in c][0]
recent[loss_col] = pd.to_numeric(recent[loss_col], errors='coerce').fillna(0)
by_op = recent.groupby('operator_name')[loss_col].sum().nlargest(10)
print("Top 10 operators by liquid lost (barrels), 2019-2023:")
for op, vol in by_op.items():
print(f" {op:<45} {vol:>10,.0f} bbls")
# Fatalities + injuries by year
if 'fatal' in df.columns or 'fatalities' in df.columns:
fat_col = [c for c in df.columns if 'fatal' in c][0]
inj_col = [c for c in df.columns if 'injur' in c][0]
recent[fat_col] = pd.to_numeric(recent[fat_col], errors='coerce').fillna(0)
recent[inj_col] = pd.to_numeric(recent[inj_col], errors='coerce').fillna(0)
by_yr = recent.groupby('year')[[fat_col, inj_col]].sum()
print("\nAnnual fatalities and injuries (hazardous liquid):")
print(by_yr.to_string())
The encoding='latin-1' parameter is necessary because the PHMSA CSV files contain characters outside the ASCII range in operator names and remarks fields — most commonly in operator names containing accented characters from French Canadian or Spanish-language parent company names. The standard UTF-8 codec will raise a decoding error on these files without the encoding override.
The low_memory=False parameter suppresses pandas' dtype inference warnings that arise when a column contains mixed types across a large file. PHMSA's CSV files occasionally encode missing numeric values as blank strings, which causes pandas to infer a column as object type before encountering the first numeric value. Passing all columns as strings initially (viadtype=str) and then applying pd.to_numeric witherrors='coerce' is the most robust pattern for these files.
The column detection logic for loss volume and fatality fields using list comprehension on column names is necessary because PHMSA has changed field names across database releases. The recent five-year files and the since-2010 files use slightly different naming conventions; writing detection logic against partial string matches rather than hardcoded column names makes the script resilient to the version of file being processed.
Extending this workflow: load the gas transmission incident file alongside the hazardous liquid file, normalize column names to a common schema, and concatenate into a single cross-system dataset. Add the PHMSA annual mileage data as a join on operator ID and year to compute mileage-normalized incident rates. Operators with high per-mile incident rates are flagged for further review against the PHMSA enforcement case database to identify whether high-incident operators have also been the subject of penalty actions — and whether assessed penalties are commensurate with the incident record.
Limitations and Cross-References
The PHMSA significant-incident threshold creates a selection floor: only events meeting one of the consequence thresholds enter the database. Minor releases that do not reach a body of water, do not cause a hospitalization, and do not result in more than $50,000 in property damage are not reported. This means the database documents the tail of the pipeline incident distribution. Near-miss events and minor leaks — which pipeline safety researchers treat as leading indicators of serious incident risk — are not systematically collected in the PHMSA significant incident files, though PHMSA does collect lower-tier reportable events in a separate dataset.
Property damage figures are operator-reported estimates and exclude environmental remediation costs in most historical records. The Kalamazoo River spill appears in the PHMSA database with a property damage figure representing immediate infrastructure damage; the $1.2 billion in cleanup costs is not captured in the PHMSA record and appears only in EPA CERCLA enforcement documentation and Enbridge financial disclosures. For incidents with significant environmental consequences, cross-referencing against EPA's Enforcement and Compliance History Online (ECHO) database and NRC spill notification records is required to construct a complete cost picture.
The NTSB investigates major pipeline accidents independently of PHMSA and publishes substantially more detailed reports covering root cause, contributing factors, and safety recommendations. Where an NTSB investigation exists for a PHMSA-recorded incident, the NTSB report is the authoritative source for understanding the technical failure and is frequently more informative than the PHMSA incident record alone. NTSB pipeline accident reports are searchable at ntsb.gov/investigations and include docket numbers that can be cross-referenced against PHMSA operator IDs for major incidents.
For methane releases from gas pipelines and storage facilities, the PHMSA incident database can be validated and supplemented using satellite methane detection data from TROPOMI (aboard ESA's Sentinel-5P) and GHGSat commercial satellites. Several published research studies have identified PHMSA-unreported methane releases from gathering and distribution systems using satellite and airborne methane measurements — a sign that the significant-incident threshold substantially undercounts total methane emissions from the pipeline system. Cross-referencing PHMSA gas release volumes against EPA's Greenhouse Gas Reporting Program (GHGRP) emissions figures for the same operators provides a systemic check on the completeness of incident reporting.