Technical writing

EPA ICIS-Air: The Federal Database Behind Clean Air Act Stationary Source Compliance

· 11 min read· AI Analytics
EPAClean Air ActAir QualityICISFederal Data

Every factory, refinery, power plant, and chemical works in America that emits to the air sits somewhere on a spectrum that runs from in compliance to High Priority Violator, and EPA keeps the ledger in the air module of the Integrated Compliance Information System—ICIS-Air. Surfaced through the ECHO platform, it amounts to roughly 279,262 stationary sources, each carrying its federal Registry ID, its Clean Air Act program classification, the pollutants it is permitted to emit, its current compliance status, a High Priority Violator flag, the date it was last given a full compliance evaluation, and the formal enforcement actions taken against it.

This article covers what ICIS-Air is and how it fits the Clean Air Act's stationary-source program; the statutory architecture of National Ambient Air Quality Standards, Title V operating permits, New Source Performance Standards, and the hazardous air pollutant standards; how sources are classified as major, synthetic minor, or true minor by their potential to emit; the Compliance Monitoring Strategy and the role of Full Compliance Evaluations, stack tests, and annual compliance certifications; the High Priority Violator policy and what the HPV flag actually means; how the air-facilities table joins outward to the enforcement-defendants and pollutant-emissions datasets through the Registry ID; the environmental-justice overlays that reveal where major sources concentrate; the analytical uses from mapping major sources to finding coverage gaps; a worked Python example against the ECHO air API that computes HPV rates and FCE coverage by state; and the caveats—self-certification, the FCE backlog, minor-source under-coverage, and registry-ID matching—that every analyst must internalize.

What ICIS-Air is and the Clean Air Act stationary-source program

The modern Clean Air Act dates to 1970, when Congress rebuilt a weak earlier statute into the framework that still governs air quality today, and to the sweeping 1990 Amendments, which added the operating-permit program and the modern hazardous-air-pollutant regime. The Act divides the universe of air pollution into mobile sources (cars, trucks, engines) and stationary sources (fixed industrial facilities). ICIS-Air is the federal compliance system for the stationary side: it is where EPA, together with the state and local air agencies that do most of the day-to-day work, records who is permitted to emit what, whether they are complying, and what happens when they are not.

ICIS-Air is the air module of the broader Integrated Compliance Information System (ICIS), the same system of record that holds EPA's enforcement cases. It replaced a long-running legacy system, the AIRS Facility Subsystem (AFS), consolidating stationary-source compliance and enforcement data into the integrated ICIS model. The public face of that data is ECHO—Enforcement and Compliance History Online, at echo.epa.gov—which exposes the ICIS-Air facility records alongside the water, hazardous-waste, and toxics program data. When this article refers to data drawn from ICIS-Air and surfaced via ECHO, it means the stationary-source compliance records made publicly queryable through ECHO's air search and air web services.

The bedrock of the whole program is the set of National Ambient Air Quality Standards (NAAQS), established under Clean Air Act Section 109 for the six criteria pollutants: fine and coarse particulate matter (PM2.5 and PM10), ozone, sulfur dioxide, nitrogen dioxide, carbon monoxide, and lead. The NAAQS define the ambient concentrations the air must not exceed. Areas that meet a standard are attainment areas; areas that exceed it are nonattainment areas, where tighter requirements apply to sources. The stationary-source program exists to drive ambient air toward those standards, and three regulatory tools do most of the work: operating permits, new-source performance standards, and hazardous-air-pollutant standards.

Title V operating permits, created by the 1990 Amendments, are the central instrument for large sources. A Title V permit is a single, comprehensive, federally enforceable document that gathers every Clean Air Act requirement applicable to a major source into one place, with monitoring, recordkeeping, and reporting obligations attached to each. Title V did not create new emission limits so much as make the existing patchwork of limits enforceable and auditable through one permit per facility. A facility's Title V status is one of the most important things ICIS-Air records, because a major source operating under a Title V permit is held to a far more rigorous compliance regime than a small source is.

New Source Performance Standards (NSPS), under Section 111, set technology-based emission limits for specific categories of new, modified, or reconstructed sources—reflecting the best system of emission reduction that has been adequately demonstrated for that category. NSPS are forward-looking: they apply when a facility is built or significantly changed, on the logic that the cheapest time to require good controls is at construction. National Emission Standards for Hazardous Air Pollutants (NESHAP), under Section 112, address the air toxics rather than the criteria pollutants. The 1990 Amendments listed 189 (now 188) hazardous air pollutants and directed EPA to set technology-based standards—the Maximum Achievable Control Technology (MACT)standards—category by category, followed by residual-risk review. A facility subject to a NESHAP or NSPS carries that fact in its ICIS-Air record, and those subparts are the specific yardsticks against which its compliance is judged.

In our database the stationary-source roster is stored as the table epa_icis_air_facilities, with roughly 279,262 facilities, alongside a companion table of formal enforcement actions. The grain of the facility table is one row per regulated stationary source, keyed to its Registry ID. The columns capture the identity, classification, and compliance posture of each source:

registry_id          -- EPA FRS Registry ID (the cross-program facility key)
facility_name        -- standardized facility name
state / county       -- location of the source
latitude/longitude   -- geographic coordinates (for mapping / EJ overlay)
air_classification   -- Title V major, synthetic minor, true minor, etc.
is_major             -- flag: major source under the Clean Air Act
program_flags        -- subprograms that apply (NESHAP, NSPS, Title V, ...)
pollutants_regulated -- the pollutants the source is permitted/limited on
compliance_status    -- current 3-year compliance status string
hpv_flag             -- High Priority Violator indicator
days_last_fce        -- days since the last Full Compliance Evaluation
formal_actions       -- count/links to formal enforcement actions

The registry_id deserves the same emphasis it gets in every EPA dataset: it is the FRS Registry ID, the persistent identifier that the Facility Registry Service assigns to a physical site and that recurs across the air, water, hazardous-waste, and enforcement systems. It is the column that lets an air-compliance record be joined to the same facility's self-reported emissions and to the enforcement cases brought against it. The classification and compliance columns are the payload; the Registry ID is what turns a single-program compliance record into a node in a cross-program facility graph.

Source classification: major, synthetic minor, and minor

The single most consequential attribute in ICIS-Air is the source's classification, because classification determines which requirements apply and how intensively the source is monitored. Classification turns on a concept called potential to emit (PTE): the maximum amount of a pollutant a source could emit operating at full capacity, accounting for any physical or operational limits that are federally enforceable. PTE, not actual emissions, is the yardstick—a facility is classified by what it could emit if it ran flat out, subject to its enforceable constraints.

A major source is one whose potential to emit exceeds the applicable major-source threshold. For criteria pollutants the general threshold is 100 tons per year of any regulated pollutant (lower in nonattainment areas, which ratchet the threshold down as the severity of the nonattainment classification rises). For hazardous air pollutants the major-source thresholds are 10 tons per year of a single HAP or 25 tons per year of any combination of HAPs. A source that crosses any of these thresholds is a major source, must obtain a Title V operating permit, and is subject to the most demanding monitoring and the major-source NESHAP standards. Major sources are the heart of the program—a minority of all facilities by count, but the bulk of industrial emissions and the focus of EPA's compliance attention.

A synthetic minor source is the analytically subtle category. It is a source whose unrestricted potential to emit would make it a major source, but which has voluntarily accepted federally enforceable permit limits—caps on hours of operation, throughput, fuel type, or emissions—that hold its potential to emit below the major threshold. By taking these limits, the source “synthesizes” minor status and escapes the full weight of Title V and the major-source standards. Synthetic minors are common and important: they are facilities that could be major but have legally committed to staying small, and their compliance hinges on honoring the very limits that keep them minor. A synthetic minor that exceeds its accepted cap is not merely over a permit limit—it has arguably been operating as an unpermitted major source.

A true minor (or natural minor) source is one whose potential to emit falls below the major threshold even without any synthetic limits—a genuinely small source by its inherent capacity. True minors are by far the most numerous category and the most lightly regulated; many are permitted by state minor-source programs or general permits rather than individual Title V permits, and they receive correspondingly less federal compliance scrutiny. A further category, the area source under Section 112, is a stationary source of hazardous air pollutants that is not a major source; area-source NESHAP standards are generally less stringent than the major-source MACT standards. The classification columns in ICIS-Air, read together with the program flags, tell an analyst not just how big a source is but which entire regime of requirements it lives under.

The Compliance Monitoring Strategy

Knowing which requirements apply to a source is only half the system; the other half is verifying compliance, and EPA structures that verification through its Compliance Monitoring Strategy (CMS). The CMS is the policy framework, implemented jointly by EPA and the delegated state and local air agencies, that sets how often and how intensively each class of source is to be evaluated. Its organizing unit is the compliance evaluation, and its most important instrument is the Full Compliance Evaluation.

A Full Compliance Evaluation (FCE) is a comprehensive assessment of a source's compliance with all applicable Clean Air Act requirements—a review of monitoring data, reports, and certifications combined, in most cases, with an on-site inspection. The CMS sets target frequencies by source class: major sources are to receive an FCE on a roughly two-year cycle, while synthetic minor (often called “minor 80 percent” or megasite) sources are evaluated less frequently, on a multi-year cycle. The days_last_fce field in the table is the direct trace of this strategy: it records how long it has been since a source's last full evaluation, and comparing that figure against the CMS target cycle is how an analyst measures whether a facility is being evaluated on schedule.

Beyond the FCE, two other compliance instruments populate the record. Stack tests are direct measurements of emissions from a specific emission point—a source is required to run a stack test under controlled conditions to demonstrate that an emission unit meets its applicable limit. Stack-test results are how the program confirms that the actual emission rate, not just the paperwork, is within bounds. And under Title V, every major source must file an annual compliance certification: a signed statement by a responsible official certifying, term by term, whether the facility complied with each condition of its Title V permit over the year, and identifying any periods of deviation. The annual certification is the linchpin of the Title V self-monitoring model—the mechanism by which a major source attests, under penalty of law, to its own compliance status—and it is one of the documents an FCE reviews. Together the FCE, the stack test, and the annual certification are the evidentiary basis for the compliance-status and HPV fields that ICIS-Air exposes.

High Priority Violators and the HPV policy

Not all noncompliance is equal, and EPA reserves a specific designation for the most serious cases: the High Priority Violator (HPV). The HPV flag is the single most important compliance signal in ICIS-Air, and understanding what it means is essential to reading the data. An HPV is not merely a source that is out of compliance—most minor permit deviations never rise to HPV status. The HPV designation is governed by EPA's High Priority Violator policy, which defines the criteria under which a Clean Air Act stationary-source violation is serious enough to warrant priority federal or state attention and a timely, appropriate enforcement response.

The policy designates a source as an HPV based on the nature and significance of the violation. The categories include violations of the major new-source review or prevention-of-significant-deterioration permitting requirements (building or modifying a major source without the required preconstruction permit and controls); violations of an applicable NESHAP or MACT standard for hazardous air pollutants; substantial violations of a Title V permit, an emission limit, or a compliance schedule; a source's failure to meet a federally enforceable consent agreement or order; chronic or recalcitrant noncompliance—a pattern of violations rather than an isolated lapse—and violations that involve a significant amount of excess emissions or that, in the judgment of the regulator, threaten public health or the environment. The common thread is significance: the HPV flag is meant to separate the violations that matter most from the routine deviations that any large permitted facility accumulates.

Operationally, an HPV designation sets an enforcement clock running. The policy contemplates that an HPV will be addressed within a defined timeline through an appropriate enforcement response—and, where warranted, resolved through a formal action that returns the source to compliance and recovers a penalty. This is why the HPV flag and the formal-actions companion table belong together: an HPV designation is the leading edge, and a formal enforcement action—an administrative penalty order, a consent agreement, or a referral that becomes one of the cases in the enforcement-defendants dataset—is frequently the resolution. A facility flagged HPV with no formal action yet recorded is, in effect, a violation in flight; a facility with both is a violation that has been carried through to a formal response.

How ICIS-Air joins to enforcement and emissions

The reason to treat ICIS-Air as more than a standalone compliance list is the Registry ID. Because every ICIS-Air facility carries its FRS Registry ID, and because the same key appears in the other EPA datasets, the air-compliance record can be joined into a single facility profile that spans permitting, emissions, and enforcement. Three joins are especially powerful.

Joining ICIS-Air to the enforcement-defendants dataset closes the loop between a violation and the case it produced. An HPV designation in ICIS-Air often ripens into a formal enforcement action; the enforcement case names the defendants, records the statute and the penalty, and tracks the compliance schedule. Because the enforcement case is linked, through ICIS, back to the facility and its Registry ID, an analyst can move from “this source is a High Priority Violator” to “here is the Clean Air Act case the United States brought against the company that operates it, the penalty assessed, and the controls it agreed to install.” The two datasets are the before-and-after of the same enforcement story: ICIS-Air shows the compliance state, the defendants table shows the formal response.

Joining ICIS-Air to the pollutant-emissions dataset—the consolidated National Emissions Inventory and Toxics Release Inventory record—turns a compliance classification into a quantified emission load. ICIS-Air tells you that a facility is a Title V major source subject to a particular NESHAP; the emissions table tells you how many tons of criteria pollutants and how many pounds of hazardous air pollutants that facility actually reported. The combination is what lets an analyst ask the sharpest questions: among the largest emitters of a carcinogenic HAP, which are flagged as High Priority Violators? Which major sources carry both a heavy emission load and a stale Full Compliance Evaluation? Neither dataset answers those alone; the Registry-ID join does.

The same key reaches further—to RCRA hazardous-waste handler status, to NPDES water discharge permits, to the Risk Management Plan database—so that a single industrial site can be assembled as one object with an air-compliance posture, an emission profile, an enforcement history, and a waste footprint. The classification and HPV fields of ICIS-Air are the air-compliance facet of that object; the Registry ID is the hinge that connects it to all the others.

Environmental justice overlays

Because every ICIS-Air facility carries latitude and longitude through FRS, the dataset supports geographic analysis, and the most consequential geographic analysis is environmental justice (EJ) screening. EPA's EJScreen tool combines environmental indicators with demographic indicators—the share of low-income and minority residents in a block group or tract—to identify communities that bear a disproportionate environmental burden. ICIS-Air contributes a distinctive layer to that picture: not just how much pollution is emitted near a community, but how many major air sources operate there and how many of them are out of compliance.

The recurring empirical finding, across decades of research and EPA's own analysis, is that major stationary sources are not randomly distributed. Refineries, chemical plants, incinerators, and heavy industry are concentrated disproportionately in low-income communities and communities of color—the “fenceline” populations that live in the immediate shadow of industrial facilities. By geocoding the ICIS-Air major sources, intersecting them with census geography, and overlaying EJScreen's demographic indices, an analyst can quantify this concentration directly: the count and density of Title V major sources, NESHAP-regulated facilities, and High Priority Violators falling within a given distance of overburdened communities, compared against the population as a whole.

This is the data substrate beneath a great deal of contemporary air-policy work—the cumulative-burden analyses that motivate enhanced monitoring in fenceline communities, the targeting of inspection and enforcement resources toward overburdened areas, and the evaluation of whether the compliance system is delivering equitable protection. The combination of compliance status, HPV flag, and precise location is what makes ICIS-Air uniquely suited to it: it allows the question to shift from “how much pollution is emitted here” to “are the facilities here being held to the same compliance standard as facilities elsewhere,” which is the question at the heart of environmental justice.

Analytical uses

A facility-level table of every stationary air source, carrying classification, compliance status, the HPV flag, and the last-evaluation date, supports a wide range of analysis— from a single-facility lookup to a national audit of the compliance system itself.

Mapping major sources by state and industry. The most direct use is an inventory: how many Title V major sources, and how many NESHAP- or NSPS-regulated facilities, operate in a given state, county, or industry sector. Because the table is facility-resolved and geocoded, the output is a concrete map of where the heavy air sources are—not an abstract sectoral count but a named, located set of facilities that can be filtered by classification and program.

Finding High Priority Violators. Filtering on the HPV flag surfaces the facilities EPA and the states have designated as serious violators— the priority enforcement targets. Combined with the classification field, this answers sharper questions: which major sources are HPVs, what is the HPV rate among major sources in a state, and how does that rate vary across states and industries? An HPV rate is a coarse but meaningful indicator of the compliance posture of a regulated population.

FCE coverage gaps. The days_last_fce field, compared against the CMS target cycle, exposes the evaluation backlog: major sources that have gone longer than the roughly two-year target without a Full Compliance Evaluation. A high proportion of stale evaluations in a state is a signal that the compliance-monitoring system is under-resourced or behind—a facility that has not been fully evaluated in years could be in serious violation without it being recorded. Coverage-gap analysis turns the dataset into a tool for auditing the regulators, not just the regulated.

Linking facilities to emissions and enforcement. As described above, the Registry ID lets the compliance record be joined to the emissions table and the enforcement-defendants table. This is where the most valuable composite questions live: which facilities combine a heavy reported emission load, an HPV designation, and an open enforcement case? Such a facility is a specific, identifiable priority that no single dataset surfaces on its own.

Environmental justice analysis. Finally, the geocoded major sources and HPVs can be intersected with demographic geography to quantify the concentration of heavy and noncompliant air sources in overburdened communities—the cumulative-burden and equitable-enforcement analysis that the location, classification, and compliance fields together make possible.

Python workflow: querying the ECHO air API

EPA exposes the ICIS-Air stationary-source data through ECHO's air web services at echodata.epa.gov/echo. The pattern is a query-handle one: a facility-search endpoint returns a query ID (QID) plus a first page of facility rows, and a paging endpoint walks the rest of the result set behind that QID. The script below pulls the Clean Air Act stationary sources in a state, flattens the fields that matter—classification, the major-source flag, compliance status, the HPV flag, and days since the last Full Compliance Evaluation—and then computes two of the analyses described above: the High Priority Violator rate among Title V major sources by state, and the share of major sources that have gone past the roughly two-year FCE cycle, which is the coverage gap made visible. No API key is required for public data. Because the exact air-service field and column names evolve between ECHO releases, the script isolates them in one place, and any production use should be validated against the current air-service documentation; for genuinely national-scale work the ECHO bulk data download is far more efficient than paging the service state by state.

import requests, pandas as pd
from collections import defaultdict

# EPA ECHO Air REST Services -- Clean Air Act stationary sources drawn
# from the air module of ICIS (ICIS-Air), surfaced through ECHO.
# No API key is required for public data. Three steps used here:
#   get_facilities -- search/summary of air facilities (returns a QID)
#   get_qid        -- page through the result set behind a QID
#   (optionally) get_facility_info for full single-facility detail
BASE = "https://echodata.epa.gov/echo"


def air_facilities(state, rows=5000):
    # Filter to Clean Air Act stationary sources in one state. The
    # parameter names below are the documented ECHO air-service filters;
    # confirm against the live air-service schema, which evolves.
    params = {
        "output": "JSON",
        "p_st": state,            # two-letter state, e.g. "TX"
        "p_act": "Y",             # active facilities only
        "responseset": rows,
    }
    r = requests.get(
        f"{BASE}/air_rest_services.get_facilities", params=params, timeout=120
    )
    r.raise_for_status()
    return r.json()


def air_qid_page(qid, start, count=1000):
    # Page through the rows behind a query handle (QID).
    params = {"output": "JSON", "qid": qid, "tablelist": "Y",
              "qcolumns": "1,2,3,14,21,22,23,24",  # see ECHO column map
              "pageno": start, "responseset": count}
    r = requests.get(
        f"{BASE}/air_rest_services.get_qid", params=params, timeout=120
    )
    r.raise_for_status()
    return r.json()


def is_true(v):
    return str(v or "").strip().upper() in ("Y", "YES", "1", "TRUE")


# --- 1. Pull Texas air facilities and flatten the rows we care about ---
first = air_facilities("TX")
res = first.get("Results", {})
qid = res.get("QueryID")
facs = res.get("Facilities", []) or []

# If the service returned a QID, page through the remainder.
total = int(res.get("QueryRows", len(facs)) or len(facs))
page, per = 2, 1000
while qid and len(facs) < total:
    more = air_qid_page(qid, page, per).get("Results", {}).get("Facilities", [])
    if not more:
        break
    facs.extend(more)
    page += 1

rows = []
for f in facs:
    rows.append({
        "registry_id": f.get("RegistryID") or f.get("FacRegistryID"),
        "name": f.get("FacName", ""),
        "state": f.get("FacState", ""),
        # Air program classification (major / synthetic minor / minor /
        # NESHAP / NSPS) arrives as a classification string or code set.
        "air_class": f.get("AIRClassification") or f.get("FacAirClassification", ""),
        "is_major": is_true(f.get("AIRMajorFlag") or f.get("FacMajorFlag")),
        "compliance_status": f.get("AIR3yrComplStatus")
        or f.get("FacComplianceStatus", ""),
        # High Priority Violator flag -- the headline noncompliance signal.
        "is_hpv": is_true(f.get("AIRHpvStatus") or f.get("FacHpvStatus")),
        # Days since the last Full Compliance Evaluation (FCE).
        "days_last_fce": pd.to_numeric(
            f.get("AIRDaysLastEvaluation") or f.get("FacDaysLastEvaluation"),
            errors="coerce",
        ),
    })

df = pd.DataFrame(rows)
if df.empty:
    raise SystemExit("No facility rows -- inspect the air-service response shape.")


# --- 2. HPV rate among Title V major sources, by state ---
# Restrict to major sources, then compute the share flagged HPV.
major = df[df["is_major"]]
hpv_rate = (
    major.groupby("state")
    .agg(majors=("registry_id", "nunique"),
         hpvs=("is_hpv", "sum"))
    .assign(hpv_pct=lambda d: d["hpvs"] / d["majors"])
    .sort_values("hpv_pct", ascending=False)
)
print("HPV rate among Title V major sources, by state:")
for st, r in hpv_rate.iterrows():
    print(f"  {st:<4} {int(r['hpvs']):>5} / {int(r['majors']):<6}  {r['hpv_pct']:6.1%}")


# --- 3. Full Compliance Evaluation coverage gap, by state ---
# EPA's Compliance Monitoring Strategy targets an FCE for every major
# source on a roughly two-year cycle (730 days). Flag majors that have
# gone longer than that -- the FCE backlog made visible.
FCE_CYCLE_DAYS = 730
maj = major.dropna(subset=["days_last_fce"])
coverage = defaultdict(lambda: {"majors": 0, "stale": 0})
for _, r in maj.iterrows():
    c = coverage[r["state"]]
    c["majors"] += 1
    if r["days_last_fce"] > FCE_CYCLE_DAYS:
        c["stale"] += 1

print("\nMajor sources past the ~2-year FCE cycle, by state:")
for st, c in sorted(coverage.items(), key=lambda kv: -(kv[1]["stale"] / max(kv[1]["majors"], 1))):
    gap = c["stale"] / c["majors"] if c["majors"] else 0.0
    print(f"  {st:<4} {c['stale']:>5} / {c['majors']:<6}  {gap:6.1%} past cycle")

Two practical notes. First, the paging loop matters: a large industrial state has tens of thousands of air facilities, well beyond a single response, so production code must walk the full QID result set rather than assuming the first page is complete. Second, the two metrics are only as meaningful as their denominators. The HPV rate should be computed over major sources specifically—an HPV rate over all facilities, the great majority of which are lightly regulated true minors, dilutes the signal—and the FCE coverage gap should be computed only over the records that actually carry a last-evaluation date, reported as such, rather than treating a missing date as a fresh evaluation.

Limitations and analytical caveats

ICIS-Air is the most comprehensive public picture of Clean Air Act stationary-source compliance in the United States, but it carries structural limitations that an analyst must internalize before drawing conclusions from it.

Compliance is heavily self-certified. The Title V model rests on the annual compliance certification—a source attesting to its own compliance, term by term—and on self-reported monitoring data. EPA and the states audit and inspect, but the day-to-day compliance record is built substantially on what facilities report about themselves. A “compliant” status reflects the absence of a recorded violation, which is not the same thing as verified compliance with every permit term at every moment. The certification carries legal weight and false certification is itself a serious violation, but the system is one of attested compliance, not continuous independent measurement.

There is an FCE backlog. The Compliance Monitoring Strategy sets target evaluation frequencies, but resources are finite, and full compliance evaluations are frequently behind schedule—the days_last_fce field often exceeds the target cycle for a meaningful share of sources. This cuts two ways for the analyst. It is a legitimate finding in its own right (the coverage-gap analysis above), but it also means that the compliance status of a source with a stale evaluation is less reliable: a violation that began after the last FCE may simply not have been detected yet. A clean status on a long-unevaluated facility is weaker evidence than a clean status on a recently evaluated one.

Minor sources are under-covered. The program's attention, and ICIS-Air's data depth, concentrate on major sources. True minor sources are numerous but lightly tracked—many are permitted through state minor-source or general-permit programs with limited federal data flow—and the smallest sources may not appear in the federal system at all. The aggregate air-quality impact of the many small sources can be substantial, yet they are sparsely represented here. Any analysis of the “full” population of air sources must acknowledge that the table is densest exactly where the regulatory program is most intensive, and thinnest at the small-source tail.

Registry-ID matching is imperfect. The power of the dataset depends on the Registry ID being populated and correct, so that ICIS-Air joins cleanly to the emissions and enforcement tables. In practice, FRS matching is not flawless: a single physical facility can occasionally carry more than one Registry ID, distinct facilities can be conflated, and a small share of records may lack a clean cross-program link. Joins by Registry ID are therefore highly reliable in the aggregate but not guaranteed at the level of any single facility, and a careful cross-program analysis should account for unmatched and duplicate records rather than assuming a perfect one-to-one correspondence.

Held with these caveats in mind, epa_icis_air_facilities remains a uniquely powerful resource: a facility-resolved, classification-aware, compliance-tracked record of roughly 279,262 stationary air sources, keyed to the Registry ID that ties each one to its emissions and its enforcement history—the data substrate beneath Clean Air Act compliance monitoring, High Priority Violator targeting, and the environmental-justice scrutiny of where America's heaviest air sources operate and how rigorously they are held to account.

Related writing

EPA Pollutant Emissions: The Federal Database Behind 10 Million Facility-Level Air and Toxic Release Records — ICIS-Air records whether a source is in compliance and how it is classified; the emissions table records how much that same facility actually puts into the air, and because both share the FRS Registry ID, a Title V major source's compliance posture can be joined directly to its reported criteria-pollutant and hazardous-air-pollutant load.

EPA Enforcement Defendants: The Federal Database Behind 200,000 Environmental Cases — A High Priority Violator designation in ICIS-Air is frequently the leading edge of a formal Clean Air Act enforcement action, and the defendants table names the companies and individuals the United States pursued, the penalties assessed, and the controls agreed to—the formal response to the noncompliance ICIS-Air flags.

EPA RCRA Hazardous Waste Data: The Federal Database Behind 400,000 Regulated Facilities — The same industrial facilities that ICIS-Air tracks as major air sources are frequently large hazardous-waste generators, and the shared Registry ID lets an analyst assemble a single site's air-compliance posture and its cradle-to-grave waste footprint into one cross-program facility profile.