A single factory on the edge of town can show up in four separate federal databases at once—as a hazardous-waste generator, as a reporter of toxic chemical releases, as a permitted air-pollution source, and as the subject of inspections and penalties. Each agency view sees one slice; none sees the whole. The work of environmental data analysis is to put them back together: to take a facility's waste status, its pollution, its air permits, and its enforcement history and resolve them to one physical site—the EPA Facility 360.
This article covers the four EPA compliance datasets that, joined, produce a facility view: the RCRA hazardous-waste handler registry, the Toxic Release Inventory of industrial chemical releases, the ICIS-Air Clean Air Act compliance record, and the ECHO integrated enforcement layer. It explains the statutory frame behind each—the Resource Conservation and Recovery Act, the 1986 Emergency Planning and Community Right-to-Know Act, and the Clean Air Act—and what each dataset records. It then turns to the central engineering problem: the Facility Registry Service and the FRS registry ID that ties a single site to its program-specific identifiers across RCRA, TRI, and air, with ECHO as the integrated enforcement layer on top. It walks through the environmental-justice and enforcement questions the assembled data answers, a Python workflow that pulls a facility by FRS ID and attaches its waste status, releases, air compliance, and enforcement actions, and the caveats of cross-program matching that every analyst must internalize before trusting a join.
What the assembled dataset is
There is no single EPA file called “everything about a facility.” The agency regulates a site under several statutes, each administered by a different program with its own data system, its own identifier, and its own reporting cadence. What we hold are four of those data systems—the ECHO enforcement records, the RCRA handler registry, the TRI release data, and the ICIS-Air compliance data—each tied back to the EPA facility registry, so that the analytical work is aligning the registry and program IDs across them rather than parsing four unrelated sources. The point of assembling them is that no single view answers the questions environmental policy actually asks. The waste data knows what a facility throws away but not what it emits to air; the release data knows the tonnage of chemicals leaving the fenceline but not whether anyone inspected the site; the air data knows the permit but not the penalty. Only the join sees the facility whole.
In our database these live as four tables— epa_rcra, epa_tri, epa_icis_air, and epa_echo—and the column that makes them one facility is the FRS registry ID. The Facility Registry Service is EPA's authoritative directory of regulated places; it assigns each physical site a persistent registry identifier and maintains the crosswalk from that identifier to the program-specific IDs the site carries in each underlying system. The columns that matter for the join are therefore the registry ID and the program identifiers it links:
registry_id -- FRS registry ID: the cross-program spine (one per site)
program_system -- which program the link points to (RCRAINFO, TRIS, AIR, ...)
program_id -- the site's ID within that program system
-- the program-specific identifiers the registry_id resolves to:
rcra_handler_id -- RCRA: the hazardous-waste handler EPA ID number
tri_facility_id -- TRI: the Toxic Release Inventory facility ID
air_pgm_sys_id -- ICIS-Air: the Clean Air Act stationary-source ID
-- the facility attributes FRS carries for matching and display:
primary_name -- facility name as registered
location_address -- street address of the physical site
latitude, longitude -- geocoded coordinates for spatial joins
naics_codes -- industry classification(s) for the facilityThe registry_id is the load-bearing column—the spine of the entire assembly. Without it, each program ID is an island: a RCRA handler ID means nothing to the air system, and a TRI facility ID means nothing to RCRA. The FRS exists precisely to break that isolation. It is a curated crosswalk, built and maintained by EPA from the underlying program records, that declares “these program IDs all refer to the same place.” Once an analyst has a registry ID, the program_system and program_id pairs hanging off it become the keys into each program table— the RCRA handler ID into epa_rcra, the TRI facility ID into epa_tri, the air ID into epa_icis_air—while ECHO, which EPA already keys to the registry ID, layers the enforcement summary on top. The remaining columns—name, address, coordinates, NAICS industry codes—are what let an analyst confirm a match, place a facility on a map, and group facilities by industry, the attributes that turn a bare crosswalk into a usable facility profile.
RCRA: who generates, transports, and disposes of hazardous waste
The Resource Conservation and Recovery Act (RCRA), enacted in 1976, is the federal statute that governs hazardous waste from the moment it is generated to the moment it is finally disposed of—the “cradle-to-grave” framework. EPA implements RCRA largely through authorized state programs, and the data lives in RCRAInfo, the national system that tracks every regulated handler of hazardous waste. A handler is any facility that generates hazardous waste, transports it, or treats, stores, or disposes of it—categories the regulations call generators (binned by how much they produce), transporters, and treatment, storage, and disposal facilities (TSDFs). Each handler is identified by an EPA hazardous-waste ID number, the key into the epa_rcra table.
For the facility view, RCRAInfo supplies the waste dimension: whether a site is a large- or small-quantity generator, what waste streams it handles, whether it operates a permitted treatment or disposal unit, and—importantly for enforcement—whether it is under corrective action for past contamination. RCRA corrective action is the program for cleaning up releases of hazardous waste and constituents at active and former handler sites, and a facility carrying a corrective-action obligation is a facility with a known contamination history. Joined to the rest of the 360 view, the RCRA record answers questions the other datasets cannot: a facility that is a large-quantity generator and is also a top toxic-release reporter and is also out of compliance with its air permit is a different risk profile than a site that triggers only one of the three—and only the join reveals the overlap.
The Toxic Release Inventory: how much, of what, to where
The Toxic Release Inventory (TRI) is the most community-facing of the four datasets, and it exists because of a disaster. After the 1984 Bhopal release in India and a serious chemical release in West Virginia, Congress passed the Emergency Planning and Community Right-to-Know Act (EPCRA)in 1986, whose central premise is that communities have a right to know what hazardous chemicals are stored and released near them. Section 313 of EPCRA created the TRI: covered industrial facilities above employee and chemical-use thresholds must report, every year, how much of each listed toxic chemical they release—to the air, to water, to land on-site, or by transfer off-site—and how much they manage through recycling, energy recovery, and treatment. The program covers hundreds of individual chemicals and chemical categories, and the data is reported facility by facility, chemical by chemical, year by year.
TRI is what makes the facility view quantitative about pollution. RCRA tells you a site handles hazardous waste; TRI tells you how many pounds of a named chemical it releases and into which medium. That distinction is what drives most environmental-justice analysis: TRI release quantities, geocoded by facility, are what let researchers map where industrial releases concentrate, overlay them on demographic data, and ask whether the heaviest releasers cluster in particular communities. The TRI facility ID is the key into the epa_tri table, and the FRS crosswalk is what connects that TRI ID to the same site's RCRA handler ID and air permit—so a facility's reported release tonnage can be set beside whether it was inspected, whether it was found out of compliance, and whether it was penalized. A large reported release with no enforcement footprint is a very different finding than a large release that drew a formal action, and the join is what surfaces the difference.
ICIS-Air: Clean Air Act permits and compliance
The Clean Air Act, the foundational 1970 statute (with major amendments in 1977 and 1990), is the law under which EPA and the states regulate air pollution from stationary sources—the factories, power plants, refineries, and other fixed facilities that emit pollutants to the atmosphere. The compliance and permitting data for these stationary sources lives in ICIS-Air, the Integrated Compliance Information System module for the Clean Air Act, which superseded the older AFS (AIRS Facility Subsystem). For a given facility, ICIS-Air records its air-program classification (notably whether it is a major source subject to the most stringent Title V operating-permit requirements, or a synthetic-minor or area source), the applicable programs, its compliance status, and the inspections and stack tests conducted against it.
In the 360 view, ICIS-Air supplies the air-permit and air-compliance dimension that neither the waste nor the release data carries. A facility's Clean Air Act status is often the best single indicator of its regulatory weight: major sources are subject to continuous monitoring, periodic stack testing, and Title V permit obligations, and a major source out of compliance is a high-priority enforcement target. The air program identifier—the ICIS-Air program-system ID—is the key into epa_icis_air, and the FRS crosswalk ties it to the same site's waste and release records. Setting the air compliance status beside the TRI air-release tonnage is especially informative: a facility reporting large air releases under TRI while its ICIS-Air record shows it in compliance with its permit is a reminder that permitted, lawful releases can still be large—the permit governs how a facility pollutes, not whether the resulting burden on a community is acceptable.
ECHO: the integrated enforcement layer
ECHO—Enforcement and Compliance History Online—is the dataset that sits on top of the other three. Where RCRA, TRI, and ICIS-Air are program-specific, ECHO is EPA's integrated compliance and enforcement platform: it pulls together inspections, violations, formal and informal enforcement actions, and penalties across the major environmental statutes—the Clean Air Act, the Clean Water Act, RCRA, and others—and presents them as a single facility-level compliance history. ECHO is already keyed to the FRS registry ID, which is precisely why it functions as the integrating layer: it does not need to be crosswalked the way the program tables do, because EPA built it on the registry from the start.
What ECHO contributes to the facility view is the enforcement record—the answer to “and what did the government do about it?” For a facility it records the count and dates of inspections, the violations found, the compliance status under each program, whether the facility has been in significant or high-priority non-compliance, the formal enforcement actions taken, and the penalties assessed. It is the layer that turns a profile of pollution into a profile of accountability. The epa_echo table, keyed by registry ID, is what lets an analyst line up a facility's enforcement actions against its waste status, its release tonnage, and its air-compliance record—so the central enforcement question, whether penalties actually follow the worst actors, becomes answerable at the level of the individual site. ECHO's own data feeds and downloadable services are public and key-free, which is what makes the whole assembly tractable without special access.
The Facility Registry Service: the spine of the join
Everything above depends on one piece of infrastructure: the Facility Registry Service (FRS). The problem the FRS solves is deceptively hard. The same physical refinery can appear in RCRAInfo under one ID, in TRI under a different ID, and in ICIS-Air under a third, each entered at a different time by a different program, with the name spelled differently, the address formatted differently, and no shared key. Naive matching on name and address fails constantly— the same company runs many sites, addresses are entered inconsistently, and a single campus may be one place to one program and several places to another. The FRS is EPA's curated answer: a master directory that assigns each real-world facility a single registry ID and records, for that ID, the set of program-system links—the RCRA handler ID, the TRI facility ID, the air ID—that all resolve to the same site.
For the analyst, this changes the nature of the work entirely. Instead of attempting to re-derive which program IDs belong together—an error-prone exercise in fuzzy matching—you start from the registry ID and read the crosswalk EPA has already built. Pull a facility by its FRS ID, and the FRS hands back its program IDs; use each program ID to query the corresponding program table; key ECHO directly by the registry ID; and the facility's waste status, its toxic releases, its air permits, and its enforcement history line up against one another. The FRS is the spine, and the program-specific identifiers are the ribs hanging off it. Practically, the registry ID is what you carry through the entire pipeline: it is the join key in every table, the unit of every facility-level aggregation, and the identifier you use to deduplicate when the same site would otherwise be counted once per program it appears in.
The environmental-justice and enforcement questions the join answers
Assembled into one facility view, the four datasets answer the questions that sit behind environmental policy and environmental-justice analysis—questions no single dataset can answer, because each requires pollution and enforcement to be measured at the same site.
Which facilities pollute the most, and are they inspected and penalized? This is the core accountability question, and it requires the TRI release tonnage and the ECHO enforcement record on the same site. Ranking facilities by reported releases and then attaching their inspection counts, compliance status, and penalties reveals whether the heaviest releasers actually draw enforcement attention—or whether the largest reported releases come from facilities that have rarely been inspected and never penalized. A high release with no enforcement footprint is exactly the gap the join exists to expose.
How do releases cluster in particular communities?Because the FRS carries geocoded coordinates, the assembled facility view can be placed on a map and overlaid on demographic data. Aggregating TRI release tonnage—weighted by chemical toxicity where the analysis calls for it—to the neighborhood, census tract, or county level, and joining that to the demographics of the population living there, is the mechanism behind the long-running finding that industrial releases and the facilities that generate them concentrate disproportionately in lower-income communities and communities of color. The 360 view adds enforcement to that picture: it lets the analysis ask not only where the releases are, but whether the facilities releasing in those communities are held to the same enforcement standard as facilities elsewhere.
Does enforcement actually follow the worst actors, and how do penalties compare to the scale of the pollution? This is where ECHO and the program data must be read together. By setting a facility's formal-action count and total penalties beside its release tonnage, its waste-generator status, and its air-major classification, an analyst can test whether the enforcement effort is proportional to the burden a facility imposes—or whether penalties are small relative to the scale of the releases, a recurring critique of environmental enforcement. The join is also what enables the inverse, retrospective question: of the facilities that drew the largest penalties, how many were already flagged across multiple programs, and could the cross-program signal have identified them as high-risk before the violation matured.
Python workflow: assembling a facility from its FRS ID
The script below pulls a single facility by its FRS registry ID and assembles the 360 view. It queries the Facility Registry Service to resolve the registry ID into its program-specific identifiers, uses each program ID to pull the matching record—the RCRA generator status from the handler table, the TRI facility record, the ICIS-Air compliance record—and queries ECHO directly by the registry ID for the integrated enforcement summary, lining up formal actions and penalties against the pollution profile. All of these are public EPA web services and none requires an API key. Because the program tables and their column names vary between Envirofacts releases, any production use should be validated against the current Envirofacts metadata and should page through full result sets for facilities with many records.
import requests, pandas as pd
# EPA public web services -- no API key required for any of these.
# FRS Facility Registry Service: the cross-program spine
# ECHO Enforcement and Compliance History Online (enforcement)
# Envirofacts efservice REST for RCRA, TRI, and ICIS-Air program data
FRS = "https://ofmpub.epa.gov/frs_public2/frs_rest_services.get_facilities"
ECHO = "https://echodata.epa.gov/echo/echo_rest_services.get_facility_info"
EF = "https://data.epa.gov/efservice"
def frs_programs(registry_id):
# One FRS registry ID resolves to a set of program-system links --
# the RCRA handler ID, the TRI facility ID, the air (ICIS) ID, etc.
r = requests.get(FRS, params={"registry_id": registry_id,
"output": "JSON"}, timeout=120)
r.raise_for_status()
facs = r.json().get("Results", {}).get("FRSFacility", [])
links = {}
for f in facs:
for pl in f.get("ProgramFacility", []) or []:
links[pl.get("ProgramSystemAcronym")] = pl.get("ProgramId")
return links
def _ef(table, col, val, fmt="JSON", page=10000):
# Envirofacts path grammar: /TABLE/COLUMN/VALUE/FORMAT/rows/0:N
path = f"{EF}/{table}/{col}/{val}/{fmt}/rows/0:{page}"
r = requests.get(path, timeout=120)
r.raise_for_status()
return pd.DataFrame(r.json())
def echo_enforcement(registry_id):
# ECHO returns the integrated compliance + enforcement summary,
# including formal-action counts and total penalties, by FRS id.
r = requests.get(ECHO, params={"p_frs": registry_id,
"output": "JSON"}, timeout=120)
r.raise_for_status()
rows = r.json().get("Results", {}).get("Facilities", [])
return rows[0] if rows else {}
def facility_360(registry_id):
links = frs_programs(registry_id)
print(f"FRS {registry_id} -> program IDs: {links}")
# --- RCRA hazardous-waste status -------------------------------------
rcra_id = links.get("RCRAINFO")
if rcra_id:
h = _ef("rcra_handler", "handler_id", rcra_id)
gen = h.get("fed_waste_generator", pd.Series(dtype=str)).head(1)
print(f" RCRA generator status: {gen.to_list()}")
# --- TRI toxic releases (most recent reporting year) -----------------
tri_id = links.get("TRIS") or links.get("TRI")
if tri_id:
rel = _ef("tri_facility", "tri_facility_id", tri_id)
print(f" TRI facility rows: {len(rel)}")
# --- Clean Air Act compliance (ICIS-Air) -----------------------------
air_id = links.get("AIR") or links.get("AIRS/AFS")
if air_id:
af = _ef("icis_air_facilities", "pgm_sys_id", air_id)
print(f" ICIS-Air facility rows: {len(af)}")
# --- ECHO enforcement: penalties vs. pollution -----------------------
e = echo_enforcement(registry_id)
penalties = float(e.get("Last5YrsPenalties", 0) or 0)
formal = int(e.get("Last5YrsFormalActions", 0) or 0)
print(f" ECHO 5-yr formal actions: {formal:,}; "
f"penalties: ${penalties:,.0f}")
return links, e
# Pull one facility by its FRS registry ID and assemble the 360 view.
facility_360("110000350799")
Two practical notes apply. First, the program-system acronyms returned by the FRS are the join contract, and they are not perfectly uniform: the air link may appear under AIR or under the legacy AIRS/AFS, and the TRI link under TRIS or TRI, which is why the script checks alternates rather than assuming a single spelling. Production code should treat the acronym set as data to be inspected, not hard-coded. Second, the single-facility, API-by-facility pattern shown here is ideal for profiling one site or a handful, but it does not scale to national analysis. EPA's ECHO bulk-data downloads, the RCRAInfo and TRI public data files on epa.gov, and the FRS national combined files ship the full crosswalk and program records in bulk—far more efficient than calling the APIs facility by facility, and the right foundation for any study that ranks facilities, maps releases against demographics, or measures penalties against pollution at scale.
Limitations and analytical caveats
The assembled facility view is the most complete public picture of a regulated site's environmental record, but the join itself introduces limitations that an analyst must hold in mind before drawing conclusions.
The FRS crosswalk is curated, not perfect. Matching disparate program records to a single physical site is genuinely hard, and the FRS, however carefully maintained, contains both kinds of error: it can split one real facility across two registry IDs, undercounting how many programs a site actually appears in, and it can merge two distinct facilities under one ID, falsely attributing one site's pollution to another. A facility that appears to trigger only one program may in fact appear in others under a registry ID the crosswalk failed to link. Any analysis that depends on a complete cross-program profile should treat the FRS links as high-quality but fallible, and spot-check the highest-stakes matches.
The four datasets are reported on different cadences and thresholds. TRI is an annual self-report by facilities above employee and chemical thresholds; many smaller sites release covered chemicals but never cross the reporting threshold and so never appear, which means absence from TRI is not evidence of no releases. RCRA generator status changes with how much waste a site produces in a period. ICIS-Air and ECHO reflect inspection and enforcement activity, which is itself a choice of where to look. Comparing a recent enforcement action against a release figure from an earlier reporting year requires care about which year each number describes, and aggregating across the four datasets means reconciling four different reporting clocks.
Reported releases are not the same as exposure, and permitted is not the same as harmless. TRI quantities are reported pounds released to a medium, not modeled doses to people; toxicity varies enormously by chemical, and a small release of a highly toxic compound can pose more risk than a large release of a benign one, which is why serious analysis weights releases by toxicity rather than summing raw pounds. And a facility in full compliance with its Clean Air Act permit can still impose a heavy burden on its neighbors—the permit constrains how a site pollutes, not whether the cumulative load on a community is just. Treating compliance as a clean bill of health, or raw release tonnage as a measure of harm, over-reads what the data can bear.
Enforcement absence is ambiguous. A facility with no recorded violations or penalties may be genuinely compliant—or may simply never have been inspected. Because inspection is a resource-constrained choice, the absence of an enforcement footprint reflects both a site's behavior and the attention it received, and the two cannot be separated from the data alone. This is precisely why the cross-program join is valuable: a facility that is a major air source, a large-quantity waste generator, and a substantial toxic releaser, yet shows no enforcement history, is not necessarily clean— it may be a site the enforcement effort has not reached, which is itself a finding worth surfacing.
Held with these caveats, the four datasets— epa_rcra, epa_tri, epa_icis_air, and epa_echo, knit together on the FRS registry ID—turn four partial views into one accountable picture of a facility: what it throws away, what it releases, what it is permitted to emit, and what the government did when it crossed the line. The spine is the registry ID; the rest is alignment.
Related writing
Following EPA enforcement: using ECHO data to track environmental violations and penalties — The ECHO layer is the enforcement spine of the facility view, and a dedicated treatment of how to read its inspections, formal actions, and penalties is the natural companion to assembling them alongside waste, release, and air data.
EPA RCRA Hazardous Waste Data: The Federal Database Behind 400,000 Regulated Facilities — The RCRA handler registry supplies the waste dimension of the 360 view, and a full account of its generator categories, TSDFs, and corrective-action program deepens the join described here.
EPA Toxic Release Inventory: 35 Years of Industrial Chemical Releases and Environmental Justice Patterns — The TRI release data is what makes the facility view quantitative about pollution, and its history under EPCRA and its central role in environmental-justice mapping are exactly the foundation the cross-program join builds on.