Technical writing

380 million transactions: indexing the DEA's ARCOS opioid distribution data

· 9 min read· AI Analytics
Healthcare dataDEAARCOSOpioid crisisOpen data

From 2006 to 2012, distributors shipped 76 billion oxycodone and hydrocodone pills across the United States. That number existed in a federal database for years before the public ever saw it. The DEA collected it, guarded it as law enforcement sensitive, and fought in court to keep it secret. It took a federal judge, a multidistrict litigation, and a Washington Post investigation to put those numbers on a page.

The database is ARCOS — the Automation of Reports and Consolidated Orders System. We've indexed it. This post explains what the dataset is, how it became public, what it contains, and how to query it against DEA enforcement and CDC overdose mortality.

What ARCOS is

ARCOS is the DEA's internal tracking system for every Schedule I and II controlled substance transaction in the US supply chain. The chain runs in one direction: manufacturer to distributor, distributor to pharmacy, pharmacy to dispenser. Every step is a transaction that federal law requires reporting to the DEA. ARCOS has collected those reports since 1971.

The coverage is comprehensive by design. Any entity that manufactures, distributes, or dispenses Schedule I or II substances — opioids, stimulants, depressants — must register with the DEA and report each transaction. The DEA issues each registrant a DEA registration number, which becomes the persistent identifier linking every shipment across the supply chain. The result is a closed-loop audit trail: every pill that enters the legal supply chain can, in principle, be tracked from synthesis to dispensing.

Why it was secret

The DEA treated ARCOS as law enforcement sensitive for decades. The agency argued, consistently, that disclosure would reveal investigative techniques: ARCOS data shows not just what was shipped but which distributors and pharmacies the DEA was watching. Releasing transaction-level data would let targets infer the contours of active investigations.

That position held until the opioid multidistrict litigation. In In re: National Prescription Opiate Litigation (MDL 2804, Northern District of Ohio), plaintiffs — states, counties, and municipalities suing opioid manufacturers and distributors — obtained ARCOS data through civil discovery. Federal Judge Dan Polster, who presided over the MDL, ordered limited public release of the data in 2019. The release covers 2006 through 2014, the period plaintiffs identified as the peak of prescription opioid oversupply.

The Washington Post and HD Media (publishers of the Huntington Herald-Dispatch) intervened in the litigation specifically to obtain the ARCOS data for publication. Their legal intervention is what produced the public dataset; without it, the data would have remained under the MDL protective order.

The data

The full ARCOS dataset for 2006–2014 contains approximately 380 million raw transaction records covering all Schedule I and II substances. Two access points exist:

  • Washington Post arcos R package (github.com/wpinvestigative/arcos) — convenience functions wrapping the Post's public API. Returns pre-aggregated county-level and pharmacy-level data for oxycodone and hydrocodone specifically (~178 million records of the full 380 million).
  • Notre Dame ARCOS portal (arcos.nd.edu) — provides full raw data download for researchers. Covers the complete Schedule I/II transaction universe, not just the two primary opioids.

The key fields in each transaction record:

REPORTER_DEA_NO       -- distributor DEA registration number
BUYER_DEA_NO          -- pharmacy DEA registration number
DRUG_CODE             -- DEA drug code (schedules I/II)
DRUG_NAME             -- e.g. "OXYCODONE", "HYDROCODONE"
TRANSACTION_DATE      -- date of shipment (YYYYMMDD)
CALC_BASE_WT_IN_GM    -- weight shipped in grams
MME_Conversion_Factor -- morphine milligram equivalent conversion

MME (morphine milligram equivalents) is the standard unit for comparing opioid potency across drug types. Conversion is applied at the transaction level: MME = CALC_BASE_WT_IN_GM * 1000 * MME_Conversion_Factor / tablet_strength_mg. For reference: 1 oxycodone 5 mg tablet = 5 MME; 1 hydrocodone 5 mg tablet = 5 MME.

What it revealed

The Washington Post's “The Opioid Files” investigation, which drove the ARCOS release and publication, produced findings that had been hiding in plain sight inside the DEA's system for years:

  • 76 billion pills. Between 2006 and 2012, distributors shipped 76 billion oxycodone and hydrocodone pills to pharmacies and dispensers across the country. The number is now so widely cited it has become a baseline for policy discussions, but it was unknown to the public until 2019.
  • Three distributors dominated. McKesson shipped approximately 44 billion pills over the period. Cardinal Health shipped 19 billion. AmerisourceBergen shipped 10 billion. These three companies accounted for the substantial majority of opioid distribution in the US supply chain during the peak years.
  • Rural Appalachian concentration. Counties receiving the most pills per capita were overwhelmingly rural Appalachian communities. Mingo County, West Virginia received 203 oxycodone and hydrocodone pills per resident per year in 2008 — for a county of about 26,000 people, that is roughly 5.3 million pills annually for two drugs alone.
  • Nine pharmacies, two towns. Nine pharmacies in Williamson and Kermit, West Virginia — two towns with a combined population of roughly 3,000 — ordered enough pills to supply every person in those towns with more than 150 pills per year. Multiple pharmacies ordered volumes inconsistent with any legitimate patient population.

The post-2014 gap

The 2019 public release covers 2006–2014 only. The DEA has not released ARCOS data for 2015 onward, despite the opioid crisis accelerating sharply through 2017 as fentanyl entered the illicit supply chain at scale.

The 2006–2014 window captures the hydrocodone and oxycodone peak accurately — the period when legal prescription volume was at its highest and pill mills were operating most visibly. What it misses is the transition. Beginning around 2013–2016, illicit fentanyl began displacing prescription opioids in the supply chain. By 2017, fentanyl and fentanyl analogs accounted for the majority of opioid overdose deaths nationally. That transition is not in the ARCOS data.

The practical consequence: ARCOS is authoritative for the prescription opioid era but cannot be used to analyze the fentanyl crisis. Cross-referencing ARCOS pill volumes against CDC overdose mortality data requires care about the time lag — the mortality signal for 2015–2022 is driven by illicit supply, not legal prescription volume.

Using the data

The Washington Post's arcos R package provides the fastest path to county-level and pharmacy-level analysis:

library(arcos)

# Pills per county per year (oxycodone + hydrocodone)
county_raw(county = "Mingo", state = "WV", key = "WaPo")

# Pills per pharmacy (BUYER_DEA_NO level)
pharmacy_raw(state = "WV", key = "WaPo")

# County population for per-capita normalization
county_population(county = "Mingo", state = "WV", key = "WaPo")

For full transaction-level analysis — including non-opioid Schedule I/II substances and distributor-level aggregations — the Notre Dame raw data download is required. The full dataset is large; partitioning by state or drug code before loading is advisable.

MME conversion at scale: apply the MME_Conversion_Factor field per record rather than drug-level lookup tables. The ARCOS export includes per-transaction conversion factors that account for formulation differences within drug codes (immediate-release vs. extended-release oxycodone have different MME factors; the ARCOS field captures this).

Cross-reference: ARCOS + DEA enforcement + CDC mortality

ARCOS becomes most analytically useful when joined to external datasets. Two cross-references are available directly through the hub:

ARCOS distributor DEA numbers + DEA enforcement actions

Each ARCOS transaction carries REPORTER_DEA_NO — the distributor's DEA registration number. The hub's DEA enforcement actions dataset (accessible as dea-actions) indexes DEA administrative actions by the same DEA registration identifier. The join is direct:

-- Which distributors shipped the most pills AND received DEA actions?
SELECT
  a.REPORTER_DEA_NO,
  SUM(a.CALC_BASE_WT_IN_GM)   AS total_weight_gm,
  COUNT(DISTINCT e.action_id) AS dea_action_count
FROM arcos_transactions a
LEFT JOIN dea_actions e ON a.REPORTER_DEA_NO = e.dea_reg_no
GROUP BY a.REPORTER_DEA_NO
ORDER BY total_weight_gm DESC

This join surfaces the enforcement gap: distributors who shipped the most pills during 2006–2014 frequently received DEA administrative actions years later, after the litigation surfaced the volume data. The temporal gap between pill volume and enforcement action date is itself an analytical finding.

ARCOS pharmacy data + CDC overdose mortality

ARCOS county-level pill volume shows a predictable relationship with CDC overdose mortality rates, but the lag matters. Research published following the ARCOS release generally finds a 3–5 year lag between peak pill-per-capita volume in a county and peak overdose mortality — reflecting the time from initial prescription opioid exposure to dependence, and then the transition to illicit supply when prescriptions tighten.

Counties with the highest pills-per-capita in ARCOS (2006–2010) show elevated overdose mortality in CDC WONDER data for 2010–2016. The correlation weakens after 2016 as illicit fentanyl supply becomes the primary mortality driver and the geographic distribution of overdose deaths shifts away from the original prescription opioid hotspots.

Access

The indexed ARCOS dataset is available at https://api.ai-analytics.org/datasets/arcos-opioid-distribution. The endpoint supports county-level, pharmacy-level, and distributor-level aggregations with MME normalization and optional cross-reference to DEA enforcement actions.

# County-level pills per year
curl https://api.ai-analytics.org/datasets/arcos-opioid-distribution   ?level=county&fips=54059&years=2006-2014

# Distributor volume with DEA enforcement cross-reference
curl https://api.ai-analytics.org/datasets/arcos-opioid-distribution   ?level=distributor&dea_no=PM0018425&include=dea_actions

# Pharmacy-level for a given state
curl https://api.ai-analytics.org/datasets/arcos-opioid-distribution   ?level=pharmacy&state=WV&drug=OXYCODONE

For the CMS Open Payments and Part D prescriber data that complements ARCOS on the payment and prescribing side of the opioid supply chain: CMS Open Payments and Part D: mapping financial relationships to prescribing patterns →

For the FEMA NFIP flood claims dataset — another large federal dataset released through litigation and FOIA pressure, with similar issues around data completeness and post-release analysis: FEMA NFIP flood claims: 2.5 million policies, 50 years of loss data →