The Opioid Epidemic in Three Federal Datasets: Distribution, Death, and Treatment

No single federal dataset shows the opioid epidemic whole. The agency that knows how the pills were shipped does not count the dead; the agency that counts the dead does not map the treatment; the agency that maps the treatment never saw the pills. But three datasets, laid over the same map, do show it: the DEA's record of every controlled-substance shipment down to the pharmacy, the CDC's record of every drug-poisoning death drawn from death certificates, and the CMS record of where Medicare addiction treatment actually reaches the survivors. Join them on geography and the supply, the toll, and the response finally line up in one view.

This article covers why the opioid epidemic is invisible in any one federal dataset and visible across three; what each of the three sources is and why they live in different agencies for different reasons—the DEA's ARCOS distribution record, the CDC's overdose-mortality series, and the CMS opioid-treatment-program file; how geography—the FIPS state and county codes—is the join key that aligns them; the arc the assembled data traces, from the prescription-pill wave through the shift to heroin and illicit fentanyl to a treatment response that arrived late and unevenly; how the data grounds the opioid litigation and its multibillion-dollar settlements; the analytical questions the join makes answerable, from per-capita pill burden against death rate to whether treatment capacity followed the need; a Python workflow that pulls the sources from their genuine public forms—including a live call to the CMS data.cms.gov API—and computes the core metrics; and the caveats—suppression, lag, the ecological fallacy, and the residence-versus-occurrence problem—that every analyst must internalize.

Why three datasets, and why three agencies

The opioid epidemic is not one event but a chain of them: drugs are manufactured and distributed, people become dependent and some of them die, and a treatment system tries to reach the survivors. Each link in that chain is the responsibility of a different part of the federal government, and each part keeps its own record for its own reasons. The Drug Enforcement Administration tracks the movement of controlled substances because that is its statutory job under the Controlled Substances Act. The Centers for Disease Control and Prevention—through the National Center for Health Statistics—counts deaths because it runs the national vital-statistics system. The Centers for Medicare & Medicaid Services maps treatment providers because it pays for the treatment. None of the three was built to describe the epidemic; each was built to do an agency's ordinary work, and the epidemic is what you see when you stack the three records on top of one another.

That fragmentation is the central analytical fact. An analyst who works only from the DEA data sees a flood of pills but not who died. One who works only from the CDC data sees a rising body count but not where the pills came from or whether help arrived. One who works only from the CMS data sees a thin and patchy treatment network but cannot say whether it was thin where the need was greatest. The value of joining the three is precisely that it closes those blind spots—it lets a single query ask whether the counties that were flooded with pills became the counties with the most deaths, and whether the treatment went where the deaths were. In our database the three are stored as the tables dea_arcos, cdc_overdose, and cms_opioid_treatment, each already organized by geography and year, so the work is aligning the geographic codes across the three rather than parsing each raw source from scratch.

The DEA ARCOS distribution record

The first source is the supply side: the DEA's Automation of Reports and Consolidated Orders System (ARCOS). ARCOS is a registration and reporting system established under the Controlled Substances Act through which manufacturers and distributors of controlled substances must report their transactions to the DEA. Every time a registered manufacturer ships a controlled substance to a distributor, and every time a distributor ships it onward to a pharmacy, a hospital, or a practitioner, that transaction is reported into ARCOS. For the prescription-opioid era this means ARCOS holds, in effect, a shipment-level ledger of how the pills moved through the legal supply chain—from factory to wholesaler to the pharmacy counter where they were dispensed.

What made ARCOS central to understanding the epidemic was a court-ordered release. Litigation by news organizations pried loose a DEA extract covering transactions of oxycodone and hydrocodone—the two opioids at the heart of the prescription wave— for the years 2006 through 2014, a release widely described as roughly 380 million transactions. For the first time the public could see, pharmacy by pharmacy and county by county, exactly how many pills had been shipped where. The data is reported in dosage units—individual pills—which is why the most quoted ARCOS figure is a per-capita pill count: the number of prescription opioid pills shipped into a county divided by its population. Because each transaction carries the buyer's location, ARCOS aggregates cleanly to the county, and that county aggregate is the form in which it joins the other two datasets.

For the join, the table's grain is the county-year, and the load-bearing columns are geography, year, and the pill count:

-- dea_arcos (county-year grain)
buyer_county_fips    -- 5-digit FIPS code of the pharmacy's county
buyer_state          -- 2-letter state of the buyer
year                 -- transaction year (2006-2014 in the public release)
drug_name            -- oxycodone or hydrocodone (the released opioids)
dosage_unit          -- pills shipped (the count that becomes per-capita)
transactions         -- number of reported shipments
population            -- county population used for the per-capita rate

The buyer_county_fips is the column that matters for the join. The Federal Information Processing Standards county code—a two-digit state code followed by a three-digit county code—is the universal geographic key in US federal data, and because ARCOS records the buyer's county it can be reduced to a FIPS-keyed table of pills per county per year. That table answers the supply-side question on its own: which counties received the most opioids per resident, and when did the shipments peak and fall. But its real power is as one of three geographically keyed tables, where the same county code that counts the pills can be set against the county code that counts the deaths.

The CDC overdose-mortality record

The second source is the toll: the CDC's drug-overdose mortality data. Its origin is the most fundamental public record there is—the death certificate. Every death in the United States is registered with a state vital-records office, and those records flow up through the National Vital Statistics System (NVSS), operated by the CDC's National Center for Health Statistics. A death certificate records, among much else, the causes of death, which a nosologist codes to the International Classification of Diseases, Tenth Revision (ICD-10). Drug-overdose deaths are identified by a specific set of ICD-10 underlying-cause codes for drug poisoning—the X40–X44 (unintentional), X60–X64 (suicide), X85 (homicide), and Y10–Y14 (undetermined intent) codes—and, where the certificate names the drugs involved, further multiple-cause codes (the T-codes) distinguish the specific substance, including the opioid categories that separate natural and semi-synthetic opioids, heroin, and synthetic opioids such as fentanyl.

The CDC publishes these mortality data through CDC WONDER, the agency's public query system, and through the NCHS data portal. WONDER lets a user request drug-overdose death counts and rates cross-tabulated by year, by cause, and—crucially for the join—by county of residence of the decedent. That county is carried as a FIPS code, the same key ARCOS uses. The deaths are typically reported both as a raw count and as a rate per 100,000 population, and the rate is what makes counties comparable: a small rural county and a large metropolitan one cannot be compared on raw death counts, but they can be compared on the death rate. The defining feature of this dataset for an analyst is therefore that it is, like ARCOS, a FIPS-keyed county-year series— here of deaths rather than pills—built to be set against the supply data.

One structural quirk of the mortality data shapes everything downstream and is treated at length in the caveats: suppression. To protect the confidentiality of decedents in small populations, the CDC suppresses county-level cells with fewer than ten deaths—they appear not as a number but as a suppression flag. The opioid epidemic is, in many places, a rural and small-county phenomenon, exactly where death counts are low enough to be suppressed, so the dataset is most silent precisely where a naive analysis would most want it to speak. Any county-level mortality analysis must decide how to handle suppressed cells before it computes a single rate.

The CMS opioid-treatment-program record

The third source is the response: the CMS record of opioid-treatment programs. An opioid treatment program (OTP) is a facility certified to dispense the medications that treat opioid use disorder—principally methadone, which under federal rules can only be dispensed for addiction treatment through a certified OTP, and buprenorphine, the medication that broadened access to treatment. Together these constitute the core of medication-assisted treatment, the evidence-based standard of care for opioid use disorder. For decades methadone treatment sat largely outside the main federal health-insurance programs; the Medicare opioid-treatment-program benefit that began in 2020—the Part B bundled benefit created by the SUPPORT Act, effective January 2020—established, for the first time, Medicare coverage of OTP services, and with it a federal provider file enumerating the programs enrolled to furnish that benefit.

CMS publishes that provider file on data.cms.gov. Each row is an enrolled opioid-treatment program, carrying its name, its NPI, its address, its phone, and its enrollment status—keyed by the program's CMS Certification Number and enrollment ID. The geography it carries natively is the state and street address, not a precomputed FIPS county code, so county-level work requires deriving the county from the address (a ZIP-to-county crosswalk or geocoding step); the simplest tractable join brings the CMS counts in at the state level. Counting OTP records yields a third geographically keyed table—not pills, not deaths, but the count of certified treatment programs available in each place. This is a coarse measure of treatment capacity—it counts programs, not slots or patients, and it captures the Medicare-enrolled OTP universe specifically rather than every treatment resource in an area—but it is enough to ask the response-side question that the other two datasets cannot: did treatment capacity exist where the deaths were concentrated?

The treatment record differs from the other two in a way worth naming. ARCOS and the CDC data are retrospective tallies of harm—pills shipped, people dead. The CMS OTP file is a snapshot of present capacity, and a recent one, since the benefit it reflects only began in 2020. That timing is itself part of the story the assembled data tells: the supply data runs from the late-2000s pill wave, the mortality data runs through the entire arc and is still climbing, and the treatment data captures a response that, at the federal-coverage level, only arrived a decade and a half into the crisis. Aligning the three is partly an exercise in holding their different time bases honestly in view.

Geography is the join key

The three datasets share almost nothing—not an agency, not a record grain, not a unit of measure, not even a person, since the people who received the pills, the people who died, and the people in treatment are overlapping but distinct populations. What they share is place. ARCOS reports by the buyer's county and the CDC reports deaths by the decedent's county of residence—both resolving to the same five-digit FIPS county code—while the CMS file reports each program by its address, which resolves natively to a state and, after a ZIP-to-county step, to the same FIPS scheme. That common geography is the only available join key, and it is what makes the integrated analysis possible at all.

Working the join is therefore a matter of getting three things to agree: the geographic code, the year, and the population denominator. The FIPS code must be normalized to a consistent five-digit, zero-padded string, because some sources drop the leading zero on state codes below ten and some store the code as an integer. The year must be aligned, recognizing that the three series do not cover the same years—ARCOS in the public release runs 2006–2014, the CDC mortality series runs across the whole period and continues, and the CMS OTP file is a recent snapshot—so a clean three-way county-year join only exists for the overlapping window, and outside it the analysis must pair the sources two at a time or step the CMS file up to the state level. And the population denominator must be consistent: the per-capita pill rate and the death rate per 100,000 should use the same county population estimates (the Census intercensal and postcensal estimates are the standard choice) so that the two rates are genuinely comparable rather than each normalized against a different denominator.

A subtlety lurks in county geography itself. County boundaries and FIPS codes are not perfectly stable over time—a handful of counties have been renamed, split, or re-coded, and a few jurisdictions (notably some in Alaska and Virginia's independent cities) require special handling—so a long time series must use a crosswalk to a consistent set of county definitions rather than assuming a code means the same place in every year. For most of the lower forty-eight this is a minor cleanup, but it is the kind of detail that separates a defensible county-level join from one that silently drops or mismatches a few percent of the data. Once the geographic code, the year, and the denominator agree, the ARCOS and CDC tables collapse into a single county-resolved frame in which each row carries pills per capita and the overdose death rate for one place, and the CMS counts can be laid alongside it—the view none of the three sources provides alone.

The arc the data traces

Assembled, the three datasets do more than answer point queries; they trace the shape of the epidemic over twenty years, and the shape is the single most important thing to understand about it. The crisis came in waves, and the three sources illuminate different waves.

The first wave is the one ARCOS quantifies most directly: the prescription-opioid era of roughly the late 1990s through the early 2010s, when aggressive marketing and a shift in pain-management practice drove a vast expansion in the prescribing of oxycodone, hydrocodone, and related opioids. The ARCOS pill counts make the scale of that flood concrete—hundreds of millions of pills into individual counties—and they make visible the geographic concentration that became notorious, the Appalachian and rural communities that received pills per capita at multiples of the national rate. As supply tightened—through prescription-monitoring programs, reformulation of the most-abused products, and tighter regulation—ARCOS shows pill shipments falling in the early 2010s.

But the CDC mortality data reveals the cruel discontinuity at the heart of the epidemic: overdose deaths kept climbing even as pill shipments fell. As the supply of prescription opioids contracted, many people who had become dependent shifted to heroin, cheaper and more available—the second wave, visible in the mortality data as a rise in heroin-coded deaths in the early-to-mid 2010s. Then came the third and deadliest wave: illicitly manufactured fentanyl and its analogues, synthetic opioids so potent that they drove overdose deaths to levels far beyond anything the prescription era produced, increasingly contaminating not just heroin but counterfeit pills and other drugs. The mortality series, broken out by the ICD-10 multiple-cause drug codes, shows this substitution as a baton-pass: prescription-opioid deaths leveling, heroin deaths rising and then themselves overtaken, and synthetic-opioid deaths climbing steeply. The CMS treatment data, finally, shows the response—the federal-coverage expansion of medication treatment—arriving late and spread unevenly across the very places the other two datasets show were hardest hit. Supply, substitution, and a lagging response: that is the arc, and it is legible only because three datasets are read together.

Grounding the opioid litigation

The integrated data is not merely a public-health curiosity; it became the evidentiary substrate of one of the largest waves of litigation in American history. Thousands of lawsuits—brought by states, counties, cities, and tribes against opioid manufacturers, distributors, and pharmacy chains—were consolidated and litigated on the theory that the defendants had flooded communities with opioids while ignoring obvious signs of diversion. The ARCOS data was central to that case precisely because it was the defendants' own transaction record: it showed, county by county, exactly how many pills each distributor had shipped, which is why its court-ordered release was so consequential.

The litigation produced settlements totaling in the tens of billions of dollars, with the funds directed substantially toward abatement—treatment, prevention, and recovery services—allocated to states and localities. The three datasets together are exactly the apparatus for evaluating that allocation. ARCOS quantifies the historical exposure that a settlement is meant to redress; the CDC mortality data measures the ongoing toll that abatement is meant to reduce; and the CMS treatment data measures whether the abatement dollars are actually building treatment capacity where it is needed. A place that received an enormous per-capita pill burden, that carries a high overdose death rate, and that still has few or no certified treatment programs is, in one row of the joined data, both the strongest claim on settlement funds and the clearest test of whether those funds are reaching their target. The data that grounded the liability case is the same data that can audit the remedy.

What the join makes answerable

A single geographically resolved frame carrying pills per capita, the overdose death rate, and the treatment-program count supports a set of questions that none of the three sources can answer alone, and that are the whole point of the assembly.

Did the pills track the deaths? The foundational question pairs ARCOS and the CDC: do the counties that received the most opioids per resident during the prescription era also carry the highest overdose death rates? Computing the correlation between per-capita pill shipments and the death rate across counties—and, with care, lagging the deaths behind the shipments to respect the time order—tests the supply-to-harm link directly. The relationship is real but it is not simple: the prescription wave seeded dependence, but the fentanyl wave that drove the deadliest years was an illicit-supply phenomenon that ARCOS, a record of the legal supply chain, does not capture at all. The correlation is strongest for the early years and weakens as the epidemic moves to substances ARCOS never tracked—which is itself one of the most important findings the join surfaces.

Did treatment follow the need or lag it? The response-side question brings in the CMS data: among the places with the highest death rates, how many have any certified opioid-treatment program, and how does treatment-program density compare between the hardest-hit areas and the rest? This is the analysis that surfaces treatment deserts—counties with high mortality and zero or few programs, often the same rural counties where the death counts are low enough to be suppressed and the prescription burden was once highest. Layering the three also supports trend work: tracking, over the overlapping years, whether the gap between need and treatment capacity narrowed as settlement and federal-coverage dollars flowed, or whether the response continued to lag the geography of the harm. These are the questions that turn three administrative byproducts into a coherent account of a public-health catastrophe and its management.

Python workflow: joining ARCOS, CDC, and CMS on geography

The script below loads the three sources from their genuine public forms—the ARCOS county-level release and a CDC WONDER / NCHS drug-overdose export by county of residence as local files, and the CMS opioid-treatment-program provider file fetched live from the data.cms.gov datastore API (no key required)—normalizes the ARCOS and CDC tables to a five-digit FIPS key, joins them, and computes the core metrics: per-capita pill shipments and the overdose death rate per 100,000. It then tests how tightly supply and death track one another (a Spearman correlation, robust to the skew in both distributions) and brings the CMS counts in at the state level, the file's native geography. The script is deliberately defensive about column names and about dropping suppressed mortality cells, because both are the most common ways a county-level opioid join goes quietly wrong; the CMS UUID must be replaced with the current dataset id from the data.cms.gov catalog, since CMS re-versions its files.

import requests
import pandas as pd

# ---------------------------------------------------------------------
# Three public, key-free federal opioid sources, joined on geography.
#   1. DEA ARCOS    -- controlled-substance shipments to pharmacies,
#                      aggregated to county-year (the 2006-2014 release)
#   2. CDC overdose -- drug-poisoning deaths from death certificates,
#                      pulled from CDC WONDER / the NCHS data portal
#   3. CMS OTP      -- Medicare opioid-treatment-program providers,
#                      pulled LIVE from the data.cms.gov datastore API
# The ARCOS and CDC tables share a 5-digit FIPS county code. The CMS
# file is keyed by program (CCN/enrollment id) and carries state and
# address, so county must be derived from the address; the simplest
# tractable join brings CMS in at the STATE level.
# ---------------------------------------------------------------------

# --- 3. CMS opioid-treatment programs (LIVE public endpoint) ----------
# data.cms.gov publishes the "Opioid Treatment Program Providers" file
# through the datastore API -- no key required. Replace the UUID with the
# current dataset id from the data.cms.gov catalog; CMS re-versions its
# files and the id changes. Columns are provider name, NPI, address,
# phone, and enrollment status -- there is NO native FIPS column, so we
# count programs by STATE and derive county only when geocoding the
# address (a separate ZIP-to-county step, omitted here for clarity).
CMS_BASE = "https://data.cms.gov/data-api/v1/dataset"
OTP_UUID = "REPLACE_WITH_CURRENT_OTP_PROVIDERS_UUID"

def _find(cols, *needles):
    for c in cols:
        u = str(c).upper()
        if all(n.upper() in u for n in needles):
            return c
    return None

def load_cms_by_state(uuid=OTP_UUID, page_size=5000):
    rows, offset = [], 0
    while True:
        r = requests.get(f"{CMS_BASE}/{uuid}/data",
                         params={"size": page_size, "offset": offset},
                         timeout=120)
        r.raise_for_status()
        page = r.json()
        if not page:
            break
        rows.extend(page)
        if len(page) < page_size:
            break
        offset += page_size
    otp = pd.DataFrame(rows)
    otp.columns = [str(c).strip() for c in otp.columns]
    st = _find(otp.columns, "STATE") or _find(otp.columns, "ST")
    return otp.groupby(st).size().rename("otp_programs").to_frame()

# --- 1. ARCOS county-year pill shipments (public flat-file release) ----
# The DEA / Washington Post release ships county-level totals as flat
# files; the canonical column is the number of pills (dosage units).
def load_arcos(path):
    a = pd.read_csv(path, dtype={"BUYER_COUNTY_FIPS": "string"})
    a = a.rename(columns=lambda c: c.strip().upper())
    fips = a["BUYER_COUNTY_FIPS"].str.zfill(5)
    state = fips.str[:2]
    pills = pd.to_numeric(a.get("DOSAGE_UNIT", a.get("PILLS")), errors="coerce")
    return pd.DataFrame({"fips": fips, "state_fips": state, "pills": pills})

# --- 2. CDC drug-overdose deaths by county (WONDER / NCHS export) ------
# NCHS / CDC WONDER exports county-resident drug-poisoning deaths
# (ICD-10 X40-X44, X60-X64, X85, Y10-Y14). Suppressed cells (<10
# deaths) come through as a non-numeric flag and must be dropped.
def load_cdc(path):
    d = pd.read_csv(path, dtype={"County Code": "string"})
    d = d.rename(columns={"County Code": "fips", "Deaths": "deaths",
                          "Population": "pop"})
    d["fips"] = d["fips"].str.zfill(5)
    d["state_fips"] = d["fips"].str[:2]
    d["deaths"] = pd.to_numeric(d["deaths"], errors="coerce")
    d["pop"] = pd.to_numeric(d["pop"], errors="coerce")
    return d.dropna(subset=["deaths", "pop"])

def analyze(arcos_path, cdc_path):
    # ARCOS x CDC join on the shared county FIPS key.
    arcos = load_arcos(arcos_path).groupby("fips").agg(
        pills=("pills", "sum"), state_fips=("state_fips", "first"))
    cdc = load_cdc(cdc_path).set_index("fips")
    df = cdc.join(arcos[["pills"]], how="inner")

    # Per-capita pills and the crude death rate per 100k residents.
    df["pills_per_capita"] = df["pills"] / df["pop"]
    df["death_rate"] = df["deaths"] / df["pop"] * 100_000

    print(f"Counties joined (ARCOS x CDC): {len(df):,}")
    print("  Median pills per capita:   "
          f"{df['pills_per_capita'].median():,.1f}")
    print("  Median death rate / 100k:  "
          f"{df['death_rate'].median():,.1f}")

    # Do the pill-flooded counties track the death counties?
    r = df["pills_per_capita"].corr(df["death_rate"], method="spearman")
    print(f"  Spearman(pills/capita, death rate): {r:.3f}")

    # Bring CMS in at the state level (its native geography) and ask
    # whether treatment capacity tracks the state-level death burden.
    cms = load_cms_by_state()
    by_state = (cdc.reset_index().groupby("state_fips")
                .agg(deaths=("deaths", "sum"), pop=("pop", "sum")))
    by_state["death_rate"] = by_state["deaths"] / by_state["pop"] * 100_000
    print(f"  States with a Medicare-enrolled OTP count: {len(cms):,}")
    return df, cms

# analyze("arcos_county.csv", "cdc_overdose_county.csv")

Two refinements separate this first pass from a defensible analysis. First, the correlation as written ignores time: it sets a county's total pill burden against its death rate in a single frame, whereas the causal question is whether high shipments in the prescription era precede elevated deaths in later years, which requires lagging the mortality series behind the ARCOS series and acknowledging that the fentanyl-driven deaths fall largely outside the window ARCOS covers. Second, the treatment metric counts programs, not capacity—a single large urban OTP and a single small rural one count the same—and it reflects only the Medicare-enrolled OTP universe, so it understates total treatment availability and should be read as a floor on the question of whether any certified program exists nearby, not as a measure of whether capacity is adequate; pushing the CMS file below the state level also requires the address-to-county geocoding step the script leaves out for clarity. The denominators, too, must be handled with care: per-capita pills and the death rate should share the same Census population estimates, and for multi-year work each year should use that year's estimate rather than a single fixed population.

Limitations and analytical caveats

The three-dataset join is the most complete public view of the opioid epidemic available, but it rests on three administrative records never designed to be joined, and several structural limitations must be internalized before any conclusion is drawn.

Suppression silences the small counties. The CDC's suppression of county cells with fewer than ten deaths removes data precisely from the rural and small-population counties where the epidemic has often been most severe per capita. An analysis that simply drops suppressed cells will systematically under-count the rural toll and bias every cross-county comparison toward the larger counties that survive suppression. There are partial remedies—aggregating to multi-year periods or to larger geographies to push counts above the threshold—but there is no way to recover a suppressed county-year count exactly, and the honest analyst reports how much of the map went dark.

The series do not cover the same years and geographies, and ARCOS misses the illicit supply. The public ARCOS release covers 2006–2014 and the legal supply chain of two opioids; the mortality data runs across the whole arc and continues; the CMS treatment file is a recent snapshot keyed by program address rather than by county. A clean three-way county-year join only exists for the overlap and only after the CMS addresses are resolved to counties, and crucially ARCOS captures the prescription wave but is blind to the heroin and illicitly manufactured fentanyl that drove the deadliest later years—those substances never moved through the registered distributors ARCOS reports. Treating the ARCOS pill burden as a proxy for total opioid exposure in the fentanyl era is therefore a category error: it measures the legal supply, which by the late 2010s had become the smaller part of the problem.

The ecological fallacy is ever-present. The join is at the county level, and the people who received the pills, the people who died, and the people in treatment are different—and only partly overlapping—populations. A county-level correlation between pill shipments and deaths does not establish that the individuals who received the pills are the individuals who died; inferring individual-level relationships from county aggregates is the textbook ecological fallacy. The joined data is powerful for describing geographic patterns and for prioritizing places, but it cannot, on its own, establish individual causation, and presenting a county correlation as if it were a personal-level finding overreads what the data can bear.

Residence, occurrence, and reporting lag distort the edges.The CDC counts deaths by county of residence, while a death may occur elsewhere, and a person dependent in one county may have obtained pills, or sought treatment, in another—so the three sources are not measuring exactly the same geographic event even when they share a FIPS code. Mortality data also lags: the most recent years are provisional and revised upward as late certificates and toxicology results arrive, so the leading edge of any trend understates the true toll. And drug specificity in the mortality data depends on whether the certifier named the drugs involved—a meaningful share of overdose certificates historically did not—so substance-specific counts (how many were fentanyl, how many heroin) are themselves undercounts of uncertain size. Held with these caveats, the join of dea_arcos, cdc_overdose, and cms_opioid_treatment is uniquely valuable: a single geographic frame in which the supply, the death toll, and the treatment response of the opioid epidemic can finally be read against one another—the whole picture that no agency, working alone, was ever positioned to see.

Related writing

380 million transactions: indexing the DEA's ARCOS opioid distribution data — The supply-side source in depth: how the ARCOS transaction ledger is structured, what the court-ordered 2006–2014 release contains, and how the pharmacy-level pill counts roll up to the county FIPS key this analysis joins on.

CDC Drug Overdose Mortality Data: The Federal Dataset Behind the Opioid Crisis — The death side in depth: the death-certificate origin, the ICD-10 drug-poisoning codes, CDC WONDER and the NCHS portal, and the county-level suppression rule that shapes every mortality analysis this join depends on.

CMS Opioid Treatment Programs: The Federal Record of Medicare Access to Addiction Care — The response side in depth: what an opioid-treatment program is, the methadone and buprenorphine benefit that began in 2020, and the CMS provider file whose per-state program counts measure whether treatment followed the need.