EPA Safe Drinking Water Act Site Visits: The Federal Record of Public Water System Inspections

Before a public water system ever incurs a drinking-water violation, a sanitary surveyor usually walks the site—the wellhead, the chlorination room, the storage tank, the pump house, the operator's logbook—and writes down what is wrong. EPA stores that on-site inspection record in the Safe Drinking Water Information System: roughly 433,150 site visits, one row per inspection, each keyed to a public water system ID and scored across the same eight evaluation areas, the upstream half of the country's drinking-water compliance machinery that the violations record only sees the downstream end of.

This article covers what the site-visits dataset is and how the Safe Drinking Water Act frames it; the difference between EPA's national rule-writing role and the state primacy agencies that actually conduct the inspections; the three types of public water system and the size categories that govern oversight intensity; the sanitary survey itself—the periodic on-site inspection, its three-to-five-year cadence, and its eight evaluation elements—and how site visits feed the violations and enforcement record; the Lead and Copper Rule Revisions and the 2024 national PFAS drinking-water standard as forces reshaping what inspectors look for; how the site-visits table joins to the SDWA violations dataset and the system inventory through the PWSID; a Python workflow that pulls site visits and violations from the ECHO/SDWIS API and computes visits-per-system and the deficiency-to-violation pipeline by state; and the caveats—state reporting variation, primacy-agency data lag, and small-system under-coverage—that every analyst must internalize before drawing conclusions.

What the dataset is

The Safe Drinking Water Information System, universally abbreviated SDWIS, is EPA's national database for the public drinking-water program. It holds the inventory of public water systems, the contaminant monitoring results those systems report, the violations they incur, the enforcement actions taken against them, and—the subject of this piece—the record of on-site inspections that state regulators conduct at the systems. Those inspections are recorded as site visits, and the dominant kind of site visit is the sanitary survey: a comprehensive, periodic, in-person evaluation of a water system's physical infrastructure, operations, and records. Surfaced through EPA's ECHO platform and the SDWIS public extracts, the site-visits record comprises roughly 433,150 rows.

In our database this record is stored as the table epa_sdwa_site_visits, with the grain of one row per inspection: a single water system inspected every three years over a decade contributes three or four rows, one per visit. The columns capture who was inspected, when, why, and what the inspector found across the standard evaluation areas:

pwsid                       -- public water system ID (state code + 7 digits)
primacy_agency_code         -- the state / primacy agency that conducted the visit
visit_date                  -- date the on-site inspection occurred
visit_reason_code           -- sanitary survey, follow-up, or other
-- evaluation-area findings (one indicator per standard area):
source_water_eval_code      -- source: wells, intakes, wellhead protection
treatment_eval_code         -- treatment process and disinfection
distribution_system_eval    -- the pipe network and pressure
finished_water_storage_eval -- tanks and reservoirs of treated water
pumps_eval_code             -- pumps, pumping facilities, controls
monitoring_reporting_eval   -- sampling, lab, and reporting practices
management_ops_eval_code    -- management, recordkeeping, operations
operator_compliance_eval    -- certified-operator staffing and compliance

The pwsid is the load-bearing column. The public water system identification number is a persistent identifier—a two-character state or primacy-agency code followed by seven digits—assigned to every regulated public water system. It is the key that ties a site visit to the same system's inventory record, its monitoring results, and, most importantly for analysis, its violations. Because the PWSID appears on every SDWIS table, an inspection finding recorded in epa_sdwa_site_visits can be joined directly to the violations the same system subsequently incurs. The visit_reason_code distinguishes the routine, scheduled sanitary survey from a follow-up visit (returning to confirm that a previously identified deficiency has been corrected) and from other visit types (complaint investigations, special evaluations, training visits). The eight evaluation-area columns are the substantive payload: each records the inspector's assessment of one of the standard areas of a water system, and together they are what turn a site visit from a bare attendance record into a diagnostic of where a system is weak.

What it is and the SDWA regulatory frame

The Safe Drinking Water Act (SDWA) was enacted in 1974 as the federal government's answer to a basic public-health gap: the country had no national, enforceable standards for the safety of tap water delivered by public water systems. Before the SDWA, drinking-water standards were a patchwork of non-binding federal guidance and uneven state rules. The 1974 Act gave EPA authority to set enforceable national standards for contaminants in drinking water and to require the systems that deliver water to the public to monitor for those contaminants, treat the water to meet the standards, and notify the public when they fail.

Two later rounds of amendments shaped the program into its present form. The 1986 amendments dramatically accelerated EPA's standard-setting, requiring the agency to regulate dozens of specified contaminants on a schedule, mandating disinfection and filtration for surface-water systems, and banning the use of lead pipe, solder, and flux in new and repaired public water systems—the statutory origin of the lead-in-drinking-water program. The 1996 amendments rebalanced the program: they introduced a risk- and cost-based framework for deciding which new contaminants to regulate (the Contaminant Candidate List and the regulatory determination process), created the Drinking Water State Revolving Fund to finance infrastructure improvements, required systems to provide customers with annual Consumer Confidence Reports, and— crucially for this dataset—strengthened the operator-certification and capacity-development requirements that sanitary surveys evaluate.

The standards EPA sets under this authority are the National Primary Drinking Water Regulations (NPDWRs). Each NPDWR sets, for a regulated contaminant, either a maximum contaminant level (MCL)—an enforceable ceiling on the concentration permitted in water delivered to consumers—or a treatment technique, a required process used when it is not economically or technically feasible to measure the contaminant at the relevant level (the Lead and Copper Rule and the Surface Water Treatment Rule are treatment-technique rules rather than simple MCL rules). The NPDWRs also impose monitoring and reporting schedules: how often a system must sample for each contaminant, where, and by when it must report the results. The site-visit record exists in service of these regulations—the sanitary survey is, in large part, an on-site check of whether a system is physically and operationally capable of meeting its NPDWRs.

The most important structural fact about the SDWA, and the one that explains the shape of the site-visits data, is primacy. EPA writes the national regulations, but it does not, in the ordinary course, run the program on the ground. Instead, states (and a few tribes and territories) apply for and receive primary enforcement responsibility—primacy— to administer the drinking-water program within their borders. To obtain primacy a state must adopt regulations at least as stringent as the federal NPDWRs and demonstrate the capacity to enforce them. Nearly all states hold primacy. The practical consequence is that the state primacy agency conducts the inspections: it is a state drinking-water program employee, not an EPA inspector, who performs the sanitary survey, scores the evaluation areas, cites the deficiencies, and takes most enforcement actions. The state then reports those results up to EPA, where they populate SDWIS. This is why the dataset is keyed by a primacy_agency_code and why, as the caveats section will stress, the completeness and consistency of the data vary from one primacy agency to the next: the federal record is an aggregation of fifty-odd state programs reporting in.

Public water system types and size categories

The SDWA does not regulate all water—it regulates public water systems, defined as systems that provide water for human consumption through pipes or other constructed conveyances to at least fifteen service connections or that regularly serve at least twenty-five people. A private household well serving a single family is not a public water system and does not appear in SDWIS. Within the universe of public water systems, the regulations draw a consequential three-way distinction by the population the system serves and how consistently it serves them, because the health risk—and therefore the regulatory intensity and the focus of a sanitary survey—differs sharply by type.

Community water systems (CWS) serve the same population year-round—the municipal utilities, water districts, mobile-home parks, and subdivisions that supply people's homes. They are the systems of greatest concern because their customers drink the water every day for years, so chronic, low-level exposure to contaminants like lead, arsenic, nitrate, disinfection byproducts, and now PFAS accumulates. Community water systems are subject to the full battery of NPDWRs and receive the most thorough and frequent sanitary surveys. Non-transient non-community water systems (NTNCWS)serve at least twenty-five of the same people for at least six months of the year—but not as their residence. The archetypes are schools, factories, office buildings, and hospitals that operate their own well. Because the same people (children at a school, workers at a plant) drink the water repeatedly over long periods, NTNCWSs are held to most of the chronic-exposure standards that apply to community systems. Transient non-community water systems (TNCWS) serve transient populations—different people who do not stay long: highway rest stops, campgrounds, gas stations, restaurants, and parks with their own water source. Because no individual is exposed for long, transient systems are regulated only for the contaminants that pose an acute, short-term risk—principally microbial pathogens (the Total Coliform Rule) and nitrate—and their sanitary surveys are correspondingly narrower.

Cutting across the three types is a size categorybased on the population served, which strongly influences oversight. EPA and the states commonly bin systems into very small (serving 500 or fewer people), small (501–3,300), medium (3,301–10,000), large (10,001–100,000), and very large (more than 100,000). The distribution is heavily skewed: the United States has on the order of 150,000 public water systems, the large majority of which are very small or small, while a small number of large and very large community systems serve the bulk of the population. This skew matters enormously for interpreting the site-visits data. The large systems have professional staff, dedicated compliance personnel, and the resources to host frequent, well-documented sanitary surveys; the very small systems—a rural water association run by a part-time operator, a campground well—are the ones most likely to be inspected infrequently, to have unaddressed deficiencies, and to be under-represented or inconsistently reported in the data. Any analysis that ranks systems or states on inspection frequency without normalizing for this size distribution will mostly be measuring the shape of the system inventory rather than the diligence of the inspectors.

The sanitary survey and the deficiency-to-violation pipeline

The sanitary survey is the heart of the site-visits dataset. It is an on-site review of a public water system's entire operation, conducted in person by the primacy agency, to evaluate the system's capability to consistently produce and deliver safe drinking water. Unlike a violation, which records a discrete failure to meet a standard, the sanitary survey is forward-looking: it is meant to find weaknesses—a cracked well casing, an unprotected wellhead, a storage tank with a failing hatch, a disinfection system without a backup, sloppy sampling practices, an operator without the required certification— before they cause a contamination event or a measured exceedance. It is the program's principal preventive instrument.

EPA regulations and the agency's guidance set the cadence. Community water systems must receive a sanitary survey at least every three years; the interval can be extended to five years for community systems that use only protected and disinfected groundwater and have an outstanding performance and compliance record. Non-community systems are surveyed at least every five years. These intervals are why the site-visit cadence in the data is so regular for well-run systems and why a gap of more than five years between visits for a given PWSID is itself a notable analytic signal—it suggests either a system that has fallen through the cracks or a primacy agency whose inspection program is under-resourced.

The survey is organized around eight standard evaluation elements, which correspond directly to the evaluation-area columns in the table. The surveyor inspects and assesses each in turn: (1) source water—the adequacy and protection of the wells or surface-water intakes, including wellhead and source-water protection; (2) treatment—the treatment processes and, critically, disinfection, including whether the system maintains adequate disinfectant residual; (3) the distribution system—the pipe network, the maintenance of positive pressure, cross-connection control, and backflow prevention; (4) finished-water storage—the integrity, protection, and sanitary condition of tanks and reservoirs holding treated water; (5) pumps, pumping facilities, and controls; (6) monitoring, reporting, and data verification— whether the system samples on schedule, uses certified labs, and reports correctly; (7) system management and operations—recordkeeping, emergency response and operations plans, and overall management capacity; and (8) operator compliance—whether the system is staffed by an appropriately certified operator. A deficiency found in any element is the unit of the survey's output, and significant deficiencies—those that, if not corrected, could cause the system to deliver unsafe water—trigger required corrective action on a defined schedule and, frequently, a follow-up site visit.

This is where the deficiency-to-violation pipelinecomes into focus, and it is the most analytically interesting feature of the dataset. A sanitary survey deficiency is not itself a violation of an MCL or a treatment technique— it is a finding that the system is at risk. But deficiencies are leading indicators. Because the site-visit record and the violations record share the PWSID, an analyst can ask a causal-flavored question that neither dataset can answer alone: do systems flagged with a given category of deficiency—say, a source-water or treatment finding—go on to incur measured violations at a higher rate than systems with clean surveys? Two distinct failure modes turn up in this analysis. The first is the failure to correct: a system is cited for a significant deficiency, does not remediate it, and later incurs a treatment-technique or MCL violation that the deficiency foreshadowed. The second is the monitoring and reporting violation, the single most common kind of SDWA violation by count, which a survey's monitoring/reporting evaluation often anticipates: a system with sloppy sampling practices noted on its survey is the system that later misses a required sampling deadline. Linking site visits to subsequent violations by PWSID is the mechanism that converts the inspection record from a backward-looking compliance log into a predictive tool for prioritizing which systems warrant attention next.

Lead, copper, PFAS, and the shifting focus of inspection

What sanitary surveyors look hardest at is not static; it tracks the contaminants that the program judges most urgent, and two recent regulatory developments have sharply changed the inspection emphasis: the revisions to the Lead and Copper Rule, and the first national drinking-water standard for PFAS.

The Lead and Copper Rule (LCR), first promulgated in 1991, is a treatment-technique rule rather than a simple MCL. Lead in tap water comes overwhelmingly not from the source water or the treatment plant but from the lead service lines and lead-bearing plumbing between the main and the tap, where corrosive water leaches lead from the pipe. The LCR therefore requires systems to control the corrosivity of their water (optimal corrosion-control treatment), to monitor lead and copper at customers' taps, and—if more than ten percent of sampled taps exceed the lead action level of fifteen parts per billion—to take additional steps including public education and lead service line replacement. The Flint, Michigan crisis, in which a change in source water without proper corrosion control caused lead to leach into the drinking water of a major city, made the rule's weaknesses nationally visible. The subsequent Lead and Copper Rule Revisions (LCRR) and the further-strengthened Lead and Copper Rule Improvements push systems toward a full inventory of their service-line materials, accelerated replacement of lead lines, lower trigger levels, and sampling at schools and child-care facilities. For the sanitary survey this means the source-water, treatment, and distribution evaluation elements increasingly incorporate corrosion-control performance and the status and accuracy of the system's lead service line inventory—an inspector now asks not only whether the water is treated, but whether the utility knows where its lead pipes are and is replacing them on schedule.

The more consequential recent change is the 2024 national PFAS drinking-water standard. Per- and polyfluoroalkyl substances—a large family of persistent synthetic “forever chemicals” used in firefighting foam, nonstick and stain-resistant coatings, and countless industrial processes—are associated with serious health effects and resist breaking down in the environment, accumulating in source water and in the human body. In 2024 EPA finalized the first enforceable National Primary Drinking Water Regulation for PFAS, setting maximum contaminant levels for several individual compounds—including PFOA and PFOS at four parts per trillion—and a hazard-index approach for certain mixtures. This is a watershed for the entire program: it brings tens of thousands of systems into a brand-new monitoring obligation, forces many to evaluate and install advanced treatment (granular activated carbon, ion exchange, or reverse osmosis) that they have never operated, and creates an entirely new category of source-water vulnerability tied to nearby industrial and military sources. For the site-visits dataset, PFAS is reshaping the source-water and treatment evaluation elements—surveyors are beginning to assess PFAS source vulnerability, the system's PFAS monitoring posture, and, where MCLs are exceeded, the adequacy of the new treatment—and it is generating a wave of monitoring violations as systems come up the learning curve on an unfamiliar contaminant. Over the next several years the PFAS rule will be one of the dominant drivers of both inspection focus and the violations the inspections anticipate.

Joining to the violations data and the system inventory

The site-visits table is most valuable not in isolation but as one facet of the integrated SDWIS record, and the pwsid is the universal join key that makes the integration possible. Three joins matter most.

The first is to the system inventory. SDWIS maintains a master table of every public water system—its name, the primacy agency, the system type (CWS, NTNCWS, TNCWS), the population served, the primary water source type (groundwater, surface water, purchased), the number of service connections, and the system's activity status. Joining epa_sdwa_site_visits to the inventory by PWSID is what lets an analyst interpret a site visit in context: it is the inventory that supplies the population served and the system type needed to normalize inspection frequency, to separate community systems from transient ones, and to weight a finding by the number of people who depend on the system. Without the inventory join, the visits are anonymous; with it, every visit is anchored to a system of known size, type, and source.

The second join is to the violations record, the one this article has emphasized. The SDWA violations dataset holds one row per violation, also keyed by PWSID, with the violation type (health-based MCL or treatment-technique violations versus monitoring and reporting violations), the contaminant or rule violated, the compliance period, and the return-to-compliance and enforcement status. Joining visits to violations by PWSID—and, with the visit and violation dates, ordering them in time—is what constructs the deficiency-to-violation pipeline: it lets the analyst test whether deficiencies precede violations, which deficiency categories are most predictive, and how long the lag is between an unaddressed finding and a measured failure. It also supports the inverse, retrospective question: of the systems that incurred a serious health-based violation, how many had a recent sanitary survey, and did that survey flag the relevant evaluation area? That question speaks directly to whether the inspection program is catching the problems it exists to catch.

The third, broader join is to other EPA datasetsthrough the facility-linkage that EPA maintains across programs. Source-water contamination does not respect program boundaries: a community system drawing from groundwater downgradient of a hazardous-waste site, or surface water downstream of a permitted industrial discharger, faces a source-water vulnerability that lives in a different EPA database. Through ECHO and the Facility Registry Service, drinking-water systems can be related to nearby RCRA hazardous-waste handlers, to facilities reporting toxic releases, and to Clean Water Act dischargers—making it possible to ask which public water systems sit in the contamination shadow of which industrial sources, the kind of cross-program, source-to-tap analysis that PFAS in particular now demands.

Analytical uses

A national, system-resolved, date-stamped record of drinking-water inspections supports a distinctive set of analyses that the violations data alone cannot.

Inspection frequency and gaps by state is the most immediate use. Because each visit carries a date, a PWSID, and a primacy agency, an analyst can compute, for each system, the interval between sanitary surveys and flag systems that have gone beyond the three- or five-year requirement, then roll those gaps up by primacy agency to compare how completely each state is meeting its inspection obligations. The necessary caution—developed in the caveats—is that an apparent gap can reflect under-reporting rather than a genuinely un-inspected system, so the metric measures the combined effect of inspection diligence and reporting discipline.

Deficiency patterns by evaluation area exploit the eight evaluation-area columns. Aggregating findings across systems reveals where the program's weaknesses concentrate—whether, for small groundwater systems, the recurring problem is source protection, disinfection, or operator certification—and how those patterns differ by system type, size, and source. This is the diagnostic that tells a state program where to direct technical assistance and training rather than merely where to enforce.

Linking site visits to subsequent violations is the analytic payoff already described at length: ordering visits and violations in time for each PWSID to measure the deficiency-to-violation pipeline, identify the most predictive deficiency categories, and convert inspection findings into a forward-looking risk score. Finally, prioritizing systems serving vulnerable populations brings the inventory join to bear: combining a system's deficiency findings and violation history with the population it serves—and, through geography, with the demographics of the community—surfaces the small, under-inspected, deficiency-laden systems serving the people least able to absorb a contamination event, which is exactly the population an equity-minded oversight program should reach first.

Python workflow: site visits and violations from the SDWIS API

The script below pulls SDWA site visits and SDWA violations for a state from EPA's Envirofacts/SDWIS REST service, joins them on the PWSID, and computes two of the core metrics: visits-per-system (how many inspections the state recorded per system it visited, plus how many systems have no visit in five or more years) and the deficiency-to-violation pipeline (the share of visited systems that subsequently show a violation). No API key is required for public data. Because SDWIS extract column names vary between releases, the script discovers the working PWSID and date column names at runtime rather than hard-coding them; any production use should be validated against the current SDWIS metadata catalog and should page through the full result set for large states.

import requests, pandas as pd
from collections import defaultdict

# EPA SDWIS / ECHO REST services -- no API key required for public data.
# Two endpoints are used together:
#   1. SDWIS site visits (sanitary surveys and follow-ups), keyed by PWSID
#   2. SDWIS violations, keyed by the same PWSID
# Column names vary slightly by SDWIS extract; the script discovers the
# working column names at runtime rather than hard-coding them.
SDWIS = "https://data.epa.gov/efservice"


def _rows(table, col, op, val, fmt="JSON", page=50000):
    # Envirofacts-style path grammar: /TABLE/COLUMN/OPERATOR/VALUE/FORMAT
    path = f"{SDWIS}/{table}/{col}/{op}/{val}/{fmt}/rows/0:{page}"
    r = requests.get(path, timeout=120)
    r.raise_for_status()
    return r.json()


def site_visits(state):
    # SDWA_SITE_VISITS holds one row per inspection. PRIMACY_AGENCY_CODE
    # is the two-letter state / primacy-agency code.
    return _rows("SDWA_SITE_VISITS", "PRIMACY_AGENCY_CODE", "=", state)


def violations(state):
    # SDWA_VIOLATIONS holds one row per violation, also keyed by PWSID.
    return _rows("SDWA_VIOLATIONS", "PRIMACY_AGENCY_CODE", "=", state)


def _find(cols, *needles):
    # Return the first column whose name contains all of the needles.
    up = [(c, c.upper()) for c in cols]
    for c, u in up:
        if all(n.upper() in u for n in needles):
            return c
    return None


def analyze(state):
    sv = pd.DataFrame(site_visits(state))
    vi = pd.DataFrame(violations(state))
    if sv.empty:
        print(f"No site-visit records returned for {state}.")
        return

    pwsid_sv = _find(sv.columns, "PWSID") or _find(sv.columns, "PWS", "ID")
    pwsid_vi = _find(vi.columns, "PWSID") or _find(vi.columns, "PWS", "ID")
    date_col = _find(sv.columns, "VISIT", "DATE") or _find(sv.columns, "DATE")

    # --- Visits per system -----------------------------------------------
    visits_total = len(sv)
    systems_visited = sv[pwsid_sv].nunique()
    per_system = visits_total / max(systems_visited, 1)
    print(f"{state}: {visits_total:,} site visits across "
          f"{systems_visited:,} systems "
          f"({per_system:.2f} visits per visited system)")

    # --- Inspection recency ----------------------------------------------
    if date_col:
        sv[date_col] = pd.to_datetime(sv[date_col], errors="coerce")
        latest = sv.groupby(pwsid_sv)[date_col].max()
        stale = (pd.Timestamp.now() - latest).dt.days > (5 * 365)
        print(f"  Systems with no visit in 5+ years: "
              f"{int(stale.sum()):,} of {systems_visited:,}")

    # --- Deficiency-to-violation pipeline --------------------------------
    # Systems that received a visit and then went on to incur a violation.
    if not vi.empty and pwsid_vi:
        visited = set(sv[pwsid_sv].dropna())
        violators = set(vi[pwsid_vi].dropna())
        both = visited & violators
        rate = len(both) / max(len(visited), 1)
        print(f"  Visited systems that later show a violation: "
              f"{len(both):,} ({rate:.1%} of visited systems)")
    return sv, vi


# Compare the deficiency-to-violation pipeline across several states.
def pipeline_by_state(states):
    summary = {}
    for st in states:
        try:
            sv = pd.DataFrame(site_visits(st))
            vi = pd.DataFrame(violations(st))
        except requests.HTTPError:
            continue
        if sv.empty:
            continue
        p_sv = _find(sv.columns, "PWSID") or _find(sv.columns, "PWS", "ID")
        p_vi = _find(vi.columns, "PWSID") or _find(vi.columns, "PWS", "ID")
        visited = set(sv[p_sv].dropna())
        violators = set(vi[p_vi].dropna()) if (not vi.empty and p_vi) else set()
        summary[st] = {
            "visits": len(sv),
            "systems": len(visited),
            "to_violation": len(visited & violators),
        }
    return dict(sorted(summary.items(),
                       key=lambda kv: -kv[1]["to_violation"]))


analyze("TX")
# print(pipeline_by_state(["TX", "CA", "PA", "NY", "FL", "OH"]))

Two practical notes apply. First, the deficiency-to-violation calculation in the script is deliberately conservative: it measures the overlap between the set of visited systems and the set of systems with any violation, which is a coarse first pass. A rigorous pipeline analysis must respect the time order—counting only violations that occur after the site visit that flagged the relevant deficiency—and should restrict the comparison to the evaluation area that matches the violation type, so that a source-water deficiency is tested against subsequent treatment or MCL violations rather than against unrelated monitoring lapses. The visit and violation dates supply everything needed to do this; the script leaves the temporal refinement as the natural next step. Second, for national-scale work—ranking every primacy agency, or building the full inventory-joined equity analysis—EPA's SDWIS public data download files (and the ECHO bulk data services) are far more efficient than thousands of paginated API calls and ship with the authoritative, version-stamped column definitions for the release.

Limitations and analytical caveats

The site-visits dataset is the most comprehensive public record of drinking-water system inspections in the United States, but it carries structural limitations that an analyst must internalize before drawing conclusions from it.

State reporting varies, because the states are the reporters. Under primacy, the inspections are conducted and recorded by fifty-odd independent state programs, each with its own data systems, its own conventions for how it codes a visit reason or scores an evaluation area, and its own discipline about forwarding results to EPA. Apparent differences between states in inspection frequency, in the rate at which deficiencies are recorded, or in which evaluation areas are flagged may partly reflect these reporting and coding differences rather than real differences in water-system condition or inspector diligence. Cross-state comparisons should be made with this firmly in mind, and ideally normalized or sanity-checked against the system inventory.

There is a primacy-agency data lag. A sanitary survey conducted in the field has to be written up, entered into the state's data system, and then transmitted to EPA's SDWIS before it appears in the federal extract. The interval from inspection to appearance in the national data can be substantial and varies by state. The most recent months of inspection activity are therefore systematically under-represented in any snapshot of the dataset, which means recency metrics—“systems with no visit in five years”—will overstate the true gap at the leading edge. This dataset is authoritative for established patterns and multi-year trends; it is not a real-time monitor of what was inspected last month.

Small systems are under-covered. The skew of the system inventory toward very small systems is mirrored by a skew in the data's completeness. The very small and small systems—rural water associations, single-well transient systems, the operator working part-time across several systems—are the ones most likely to be inspected at the longest permissible intervals, to have findings recorded inconsistently, and to be under-represented relative to the professionally staffed large utilities. Because these are also the systems most prone to actual deficiencies, the data paradoxically captures the best-run systems most completely and the most-at-risk systems least completely. An analysis that treats the absence of recorded deficiencies as evidence of compliance will systematically understate the problem precisely where the problem is worst.

A site visit is not a violation, and an evaluation-area code is a summary. The eight evaluation-area indicators compress a surveyor's detailed, often narrative, field findings into coded summaries; the full texture of what was found—how serious a deficiency was, whether it was corrected on the spot, what the corrective-action schedule required—lives in the underlying state survey reports, not in the federal extract. And a sanitary survey deficiency is a risk finding, not a legal determination that a standard was breached. Treating a deficiency as if it were a violation, or treating a clean evaluation-area code as a guarantee that the area is sound, over-reads what the dataset can bear.

Held with these caveats in mind, the epa_sdwa_site_visits table is a uniquely valuable resource: a system-resolved, date-stamped, evaluation-area-scored record of the on-site inspections that stand between the country's public water systems and the contaminants the Safe Drinking Water Act exists to keep out of the tap—the preventive half of a regulatory program whose failures, when the prevention does not hold, show up next door in the violations record.

Related writing

EPA RCRA Hazardous Waste Data: The Federal Database Behind 400,000 Regulated Facilities — Source-water contamination is often the downstream consequence of upstream hazardous-waste mismanagement, and through EPA's facility linkage a public water system can be related to the RCRA generators and corrective-action sites whose releases threaten the groundwater it draws from, turning a sanitary survey's source-water evaluation into a cross-program contamination question.

EPA Pollutant Emissions: The Federal Database Behind 10 Million Facility-Level Air and Toxic Release Records — Air deposition of mercury and the migration of industrial toxics into surface and groundwater connect facility emissions to drinking-water source vulnerability, and the same PFAS compounds now driving the 2024 drinking-water standard appear on the toxic-release side of the emissions record.

EPA Enforcement Defendants: The Federal Database Behind 200,000 Environmental Cases — When a sanitary survey's deficiencies harden into uncorrected violations, the next stage is enforcement, and the defendant record names the parties EPA and the states pursue across the Safe Drinking Water Act and the rest of the environmental statutes.