Technical writing

CMS Medicare Part D Prescriber Data: The Federal Database Behind Drug Spending for 1 Million Providers

· AI Analytics
CMSMedicarePart DDrug PrescribingFederal Data

CMS publishes annual Medicare Part D prescriber-level drug spending data for every provider who prescribed drugs covered under Medicare—enabling researchers and journalists to identify outlier prescribers, track opioid prescribing patterns, and analyze drug spending by specialty and geography.

What Medicare Part D is

The Medicare Prescription Drug Benefit—Part D—was enacted by the Medicare Prescription Drug, Improvement, and Modernization Act of 2003 (commonly called the Medicare Modernization Act, or MMA) and took effect January 1, 2006. It created, for the first time, a federally subsidized outpatient prescription drug benefit for Medicare beneficiaries. Enrollment is voluntary; as of 2023, roughly 48 million beneficiaries are enrolled, making Part D one of the largest drug insurance programs in the world by covered population.

Part D is not administered directly by the federal government. Instead, CMS contracts with private “Part D plan sponsors” who design and sell drug coverage to beneficiaries. These sponsors fall into two categories: stand-alone Prescription Drug Plans (PDPs), which pair with traditional Medicare Parts A and B; and Medicare Advantage Prescription Drug plans (MAPDs), which bundle medical and drug coverage together. Plan sponsors negotiate formularies—lists of covered drugs and their cost-sharing tiers—with drug manufacturers and pharmacy benefit managers (PBMs), who administer the benefit on the sponsors' behalf.

The standard Part D benefit structure has several phases. Enrollees pay a monthly premium (plan-specific), a deductible (up to a CMS cap, $545 in 2024), an initial coverage phase where cost-sharing applies at negotiated rates, and historically a coverage gap known as the “donut hole,” in which beneficiaries faced sharply higher out-of-pocket costs. The Affordable Care Act phased out the donut hole for brand drugs by 2020, and the Inflation Reduction Act of 2022 capped Part D out-of-pocket costs at $2,000 per year beginning in 2025—a structural change to the benefit that eliminates catastrophic cost exposure for high-drug-cost enrollees. After total drug spending exceeds the catastrophic threshold (approximately $8,000 out-of-pocket in 2024), Medicare pays 80% of drug costs directly. Low-Income Subsidy (LIS) enrollees—also called “Extra Help”—receive substantially reduced premiums, deductibles, and cost-sharing based on income and asset eligibility criteria.

The CMS Part D prescriber dataset

CMS publishes the Medicare Part D Prescribers by Provider and Drug dataset annually on data.cms.gov. Data typically appears approximately 18 months after the end of the calendar year it covers: the 2022 program year data, for example, became available in mid-2024. The dataset is one of the most analytically powerful public records CMS produces, because it links individual prescriber identities to specific drug names and spending totals at a level of granularity unavailable from any other public source.

Each row in the file represents a single prescriber–drug combination. A physician who prescribed both oxycodone and atorvastatin will appear in two separate rows. Each annual file covers more than one million distinct prescribers and more than 5,000 distinct drug names, yielding approximately 25 million rows per program year. The file is available via Socrata API and direct CSV download from data.cms.gov.

The schema is consistent across annual releases, though CMS revises field definitions and suppression thresholds occasionally. Key fields include:

  • prscrbr_npi: the prescriber's 10-digit National Provider Identifier—the universal provider key used to join to NPPES, Open Payments, and other CMS datasets
  • prscrbr_last_org_name and prscrbr_first_name: prescriber name as registered in the NPI system
  • prscrbr_city and prscrbr_state_abrvtn: geographic location of the prescriber's primary practice address
  • prscrbr_type: provider specialty or type (e.g., “Internal Medicine,” “Pain Management,” “Family Practice”)
  • gnrc_name: the generic drug name (e.g., “oxycodone hcl”); the primary field for drug-level filtering
  • brnd_name: the brand drug name associated with the fill (e.g., “OxyContin”)
  • tot_clms: total Part D claims (prescription fills) for this prescriber–drug combination in the program year
  • tot_30day_fills: total claims standardized to 30-day supply equivalents, enabling comparisons across different supply durations
  • tot_drug_cst: total drug cost in dollars, including amounts paid by the plan, the beneficiary, and any other payers; represents the gross ingredient cost before rebates
  • tot_benes: count of distinct beneficiaries who filled at least one prescription from this prescriber for this drug
  • ge65_tot_clms: subset of total claims for beneficiaries aged 65 and older
  • bene_race_nonhsp_wht_cnt, bene_race_nonhsp_blk_cnt, and related fields: race and ethnicity breakdowns of beneficiaries, published subject to suppression rules when cell counts are small, enabling equity-focused research

CMS suppresses cells where fewer than 11 beneficiaries are associated with a prescriber–drug combination, consistent with its standard privacy protection thresholds. Analysts should treat suppressed rows as missing rather than zero, which matters for aggregate calculations in low-volume prescriber or drug strata.

Notable research and investigative uses

The Part D prescriber dataset has become a standard data source for investigative journalism and academic health services research precisely because it names individual providers and links them to specific drug spending. The following uses illustrate its analytical range.

ProPublica's “Dollars for Docs” cross-referenced Part D prescribing data with CMS Open Payments records to identify physicians whose prescribing patterns correlated with payments from the companies whose drugs they prescribed most heavily. The join is performed on the prescriber NPI, which appears in both datasets. ProPublica's interactive tool allowed the public to look up any Medicare prescriber, see their total drug spending by drug name, and compare it to payments received from the manufacturer of those drugs—surfacing, for example, cardiologists who were among the highest prescribers of a PCSK9 inhibitor and who had received substantial consulting fees from its manufacturer.

Opioid prescribing investigations drew heavily on Part D data. CMS itself used the prescriber file to identify the top 1% of opioid prescribers nationally—providers whose opioid claim volumes were statistical outliers relative to their specialty and geographic peers. The DEA cross-referenced these CMS-identified outliers against its own registration and enforcement records, leading to targeted investigations. Journalists at the Los Angeles Times, Wall Street Journal, and national public radio used the dataset to report on specific physicians who appeared in Part D data as extraordinary outliers in opioid prescribing volume, sometimes prescribing at ten times or more the average for their specialty.

Academic research on drug pricing has used Part D prescriber data to study brand versus generic prescribing rates across payer types, specialties, and regions. Studies have examined whether physicians in academic medical centers prescribe generics at different rates than those in community settings; whether prescriber specialty explains variation in brand-name drug costs for conditions with therapeutically equivalent generic alternatives; and how geographic variation in prescribing patterns interacts with plan formulary design. Because the dataset covers all prescribers who submitted Part D claims—not a sample—these analyses are conducted on population-level data rather than survey estimates.

Fraud detection is among CMS's own primary uses of the dataset. Outlier prescribers—those whose drug combinations, volumes, or cost patterns deviate substantially from specialty and geographic benchmarks—are flagged for review by the CMS Center for Program Integrity and the HHS Office of Inspector General. Controlled substance fraud patterns in Part D have included prescribing to deceased or fictitious beneficiaries, prescribing quantities incompatible with any legitimate treatment regimen, and coordinated schemes between prescribers and pharmacies operating as “pill mills.”

High-intensity prescribers and fraud enforcement

The Part D prescriber dataset has been central to federal enforcement actions against fraudulent prescribers because it provides a systematic, quantitative basis for identifying outliers that would be impossible to surface through case-by-case investigation alone. CMS and OIG have developed statistical models that flag prescribers whose patterns—opioid claim rates, controlled substance concentration, beneficiary-to-claim ratios—fall outside expected ranges for their specialty and geography.

Operation Bended Knee, a joint DOJ and HHS-OIG enforcement action, targeted pain clinics operating as cash-payment controlled substance dispensaries. Clinic operators paid physicians to sign pre-printed prescriptions for oxycodone and other Schedule II drugs without conducting clinical examinations; the prescriptions were filled at affiliated pharmacies and billed to Part D. The prescribers involved appeared in Part D data with opioid claim volumes orders of magnitude above their specialty averages—a pattern that CMS and OIG use as a primary detection signal.

In 2014, CMS established the Part D Preclusion List, which bars prescribers who have been excluded from Medicare or Medicaid participation, or who have been subject to certain revocation actions, from having their prescriptions reimbursed under Part D. Beginning in January 2019, CMS expanded the preclusion list to cover prescribers whose participation rights were revoked for cause, prescribers on the OIG exclusion list, and individuals who, even if never enrolled in Medicare, have been convicted of Medicare or Medicaid fraud. Part D plan sponsors are required to reject claims from preclusion list prescribers.

The “pill mill” prescriber pattern in Part D data is quantitatively distinctive: some prescribers appearing in the dataset show opioid claim counts more than ten times the 90th percentile for their reported specialty, concentrated in a narrow set of controlled drugs, with beneficiary counts that imply each patient is receiving multiple controlled substance prescriptions per month. CMS publishes aggregate statistics on prescriber outlier identification but does not publish the individual prescriber flagging scores, which are used in program integrity work rather than public reporting.

Specialty drug spending and the cost concentration problem

While opioid prescribing has attracted the most investigative attention, the Part D prescriber data also illuminates the concentration of high-cost specialty drug spending within Medicare. Part D plans designate a “specialty tier” for drugs with a monthly cost exceeding approximately $800 (the threshold is plan-specific and CMS-regulated), at which beneficiaries face the highest cost-sharing rates. The drugs in this tier account for a disproportionate share of total Part D spending.

Hepatitis C direct-acting antivirals illustrated the fiscal impact of specialty drug approvals on Part D. Harvoni (ledipasvir/sofosbuvir), approved in 2014, carried a list price of approximately $84,000 for a 12-week course. A single prescriber who treated a large hepatitis C panel could generate millions of dollars in Part D drug cost in a single year. Part D prescriber data allowed CMS, researchers, and policymakers to quantify the concentration of hepatitis C treatment spending by prescriber specialty and geography, and to track the rate at which patients were being treated after curative drugs became available.

Immunology and oncology drugs similarly dominate Part D cost-per-claim statistics. Humira (adalimumab), Keytruda (pembrolizumab), and Eliquis (apixaban) have appeared among the highest-spending drugs in the Part D prescriber dataset in recent program years. The prescriber data allows analysis of which specialties account for spending on these drugs, how geographic access affects utilization, and how cost trends evolve year-over-year as biosimilar competition affects brand-name utilization.

The Inflation Reduction Act of 2022 introduced two structural changes that will affect Part D spending patterns visible in future prescriber data releases. First, Medicare gained the authority to negotiate prices directly with manufacturers for a defined set of high-spending drugs—initially 10 drugs for 2026, expanding in subsequent years. This negotiation authority, unprecedented in the Part D program, is expected to reduce gross drug costs for the negotiated drugs and shift the spending distribution. Second, the $2,000 annual out-of-pocket cap beginning in 2025 will eliminate the catastrophic coverage phase as a distinct financial threshold, changing the beneficiary cost-sharing pattern for high-cost drug users.

A persistent analytical issue in the Part D prescriber data is the distinction between Part B and Part D drug coverage. Physician-administered drugs—infusions, injections administered in a clinical setting—are generally reimbursed under Medicare Part B, not Part D. Self-administered drugs that a patient picks up at a pharmacy fall under Part D. This distinction means that the Part D prescriber dataset does not capture the full drug prescribing picture for specialties that rely heavily on infused therapies: an oncologist whose patients primarily receive chemotherapy infusions in-office will appear to prescribe fewer drugs in Part D than their actual treatment volume implies, because the infused agents are billed under Part B. Comprehensive prescribing analysis for these specialties requires joining Part D prescriber data with the CMS Part B drug spending dataset.

Data access

The primary access point is data.cms.gov, where CMS publishes the Part D prescriber dataset via the Socrata platform. The dataset “Medicare Part D Prescribers by Provider and Drug” is available at:

https://data.cms.gov/provider-summary-by-type-of-service/medicare-part-d-prescribers/medicare-part-d-prescribers-by-provider-and-drug

Multiple program years are available as separate datasets, each with its own Socrata dataset ID. The Socrata API supports server-side filtering via SoQL query parameters. A state filter looks like:

?$where=prscrbr_state_abrvtn='TX'

Drug name filtering uses a LIKE clause against the gnrc_name field:

?$where=LOWER(gnrc_name) LIKE '%oxycodone%'

The CMS Summary Statistics on Use and Payments also includes a drug-level aggregation (aggregated across all prescribers for a drug, without individual prescriber detail) at:

https://data.cms.gov/summary-statistics-on-use-and-payments/medicare-medicaid-spending-by-drug/medicare-part-d-spending-by-drug

Provider identity enrichment is available via the NPPES NPI Registry API at https://npiregistry.cms.hhs.gov/api/, which returns specialty (via taxonomy codes), practice address, and provider type for any NPI. Because both the Part D prescriber dataset and NPPES use NPI as the primary identifier, joins are straightforward.

ProPublica maintained a legacy Part D API at https://projects.propublica.org/api-docs/data-sources/medicationsthat provided prescriber-level summaries. This API was built on top of the CMS data and is useful for understanding the analytical patterns ProPublica applied, though direct CMS data.cms.gov access is preferable for current research given the annual update cadence on the official source.

Python: opioid prescribing analysis via the Socrata API

The following Python script fetches CMS Part D prescriber data for opioid drugs (oxycodone, hydrocodone, fentanyl, and morphine) from the data.cms.gov Socrata API and performs five analyses: total opioid claims by state; top 20 highest-volume opioid prescribers nationally; cost per claim versus specialty average; brand versus generic opioid prescribing rate by state; and year-over-year change in opioid prescribing from 2019 through 2022, capturing the COVID-19 effect on prescribing patterns. Requirements: requests andpandas. No API key is required for the Socrata endpoints at typical query volumes.

import requests
import pandas as pd
import io

# ---------------------------------------------------------------------------
# CMS Medicare Part D Prescribers -- Socrata API
# Dataset: "Medicare Part D Prescribers by Provider and Drug"
# Source:  https://data.cms.gov/provider-summary-by-type-of-service/
#          medicare-part-d-prescribers/medicare-part-d-prescribers-by-provider-and-drug
#
# Each row = one prescriber NPI x one generic drug name combination.
# Key fields used here:
#   prscrbr_npi           -- 10-digit National Provider Identifier
#   prscrbr_last_org_name -- prescriber last name or org name
#   prscrbr_first_name    -- prescriber first name
#   prscrbr_city          -- city of practice
#   prscrbr_state_abrvtn  -- 2-letter state abbreviation
#   prscrbr_type          -- specialty/provider type
#   gnrc_name             -- generic drug name
#   brnd_name             -- brand drug name
#   tot_clms              -- total Part D claims
#   tot_30day_fills       -- total 30-day equivalent fills
#   tot_drug_cst          -- total drug cost (dollars)
#   tot_benes             -- distinct beneficiaries
#   ge65_tot_clms         -- claims for beneficiaries 65+
#
# Socrata dataset IDs change with each annual release; verify at data.cms.gov.
# The ID below is for the 2022 program year release.
# ---------------------------------------------------------------------------

SOCRATA_BASE = "https://data.cms.gov/resource"
DATASET_ID   = "9n45-2f87"    # Part D Prescribers by Provider and Drug, 2022

# Opioid generic drug name fragments to match against gnrc_name
OPIOID_FILTERS = [
    "oxycodone",
    "hydrocodone",
    "fentanyl",
    "morphine",
]


# ---------------------------------------------------------------------------
# 1. Fetch opioid prescriber rows from the Socrata API
# ---------------------------------------------------------------------------

def fetch_opioid_rows(limit: int = 500000) -> pd.DataFrame:
    """
    Query the Part D prescriber dataset for each opioid drug name fragment.
    The Socrata LIKE filter is applied server-side; results are concatenated.
    """
    frames = []
    for drug in OPIOID_FILTERS:
        url = f"{SOCRATA_BASE}/{DATASET_ID}.csv"
        params = {
            "$where": f"LOWER(gnrc_name) LIKE '%{drug}%'",
            "$limit": limit,
            "$select": (
                "prscrbr_npi,prscrbr_last_org_name,prscrbr_first_name,"
                "prscrbr_city,prscrbr_state_abrvtn,prscrbr_type,"
                "gnrc_name,brnd_name,tot_clms,tot_30day_fills,tot_drug_cst,tot_benes"
            ),
        }
        resp = requests.get(url, params=params, timeout=180)
        resp.raise_for_status()
        df = pd.read_csv(io.StringIO(resp.text), dtype=str, low_memory=False)
        df["opioid_filter"] = drug
        frames.append(df)
        print(f"  {drug}: {len(df):,} rows")

    combined = pd.concat(frames, ignore_index=True)
    # Drop duplicate NPI x gnrc_name rows that may arise from overlapping filters
    combined = combined.drop_duplicates(subset=["prscrbr_npi", "gnrc_name"])
    print(f"Total deduplicated opioid rows: {len(combined):,}")
    return combined


def to_numeric_cols(df: pd.DataFrame) -> pd.DataFrame:
    for col in ["tot_clms", "tot_30day_fills", "tot_drug_cst", "tot_benes"]:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors="coerce")
    return df


# ---------------------------------------------------------------------------
# 2. Total opioid claims by state
# ---------------------------------------------------------------------------

def opioid_claims_by_state(df: pd.DataFrame) -> None:
    state_agg = (
        df.groupby("prscrbr_state_abrvtn")
        .agg(
            total_claims=("tot_clms", "sum"),
            total_cost=("tot_drug_cst", "sum"),
            prescribers=("prscrbr_npi", "nunique"),
        )
        .sort_values("total_claims", ascending=False)
        .head(15)
    )
    print("\n=== Top 15 States: Total Opioid Claims (Part D 2022) ===")
    print(f"  {'State':<6}  {'Claims':>12}  {'Cost ($)':>14}  {'Prescribers':>12}")
    print("  " + "-" * 48)
    for state, row in state_agg.iterrows():
        print(
            f"  {state:<6}  {row['total_claims']:>12,.0f}  "
            f"${row['total_cost']:>13,.0f}  {row['prescribers']:>12,}"
        )


# ---------------------------------------------------------------------------
# 3. Top 20 highest-volume opioid prescribers nationally
# ---------------------------------------------------------------------------

def top_opioid_prescribers(df: pd.DataFrame, n: int = 20) -> None:
    prescriber_agg = (
        df.groupby(
            ["prscrbr_npi", "prscrbr_last_org_name", "prscrbr_first_name",
             "prscrbr_state_abrvtn", "prscrbr_type"],
            as_index=False,
        )
        .agg(
            total_claims=("tot_clms", "sum"),
            total_cost=("tot_drug_cst", "sum"),
            distinct_drugs=("gnrc_name", "nunique"),
        )
        .sort_values("total_claims", ascending=False)
        .head(n)
    )
    print(f"\n=== Top {n} Opioid Prescribers by Total Claims ===")
    print(f"  {'NPI':<12}  {'Last Name':<22}  {'State':<5}  {'Specialty':<28}  {'Claims':>8}  {'Cost':>12}")
    print("  " + "-" * 93)
    for _, row in prescriber_agg.iterrows():
        print(
            f"  {row['prscrbr_npi']:<12}  {row['prscrbr_last_org_name'][:21]:<22}  "
            f"{row['prscrbr_state_abrvtn']:<5}  {str(row['prscrbr_type'])[:27]:<28}  "
            f"{row['total_claims']:>8,.0f}  ${row['total_cost']:>11,.0f}"
        )


# ---------------------------------------------------------------------------
# 4. Cost per claim vs. specialty average (opioids)
# ---------------------------------------------------------------------------

def cost_per_claim_vs_specialty(df: pd.DataFrame) -> None:
    df = df.copy()
    df["cost_per_claim"] = df["tot_drug_cst"] / df["tot_clms"].replace(0, float("nan"))

    # Specialty-level average cost per claim
    specialty_avg = (
        df.groupby("prscrbr_type")["cost_per_claim"]
        .mean()
        .rename("specialty_avg_cost_per_claim")
        .reset_index()
    )

    df = df.merge(specialty_avg, on="prscrbr_type", how="left")
    df["ratio_to_specialty_avg"] = df["cost_per_claim"] / df["specialty_avg_cost_per_claim"]

    outliers = (
        df[df["tot_clms"] >= 50]
        .sort_values("ratio_to_specialty_avg", ascending=False)
        .head(15)
    )

    print("\n=== Prescribers with Highest Cost-Per-Claim vs. Specialty Average (min 50 claims) ===")
    print(f"  {'NPI':<12}  {'State':<5}  {'Specialty':<28}  {'Cost/Claim':>11}  {'Ratio':>7}")
    print("  " + "-" * 70)
    for _, row in outliers.iterrows():
        print(
            f"  {row['prscrbr_npi']:<12}  {row['prscrbr_state_abrvtn']:<5}  "
            f"{str(row['prscrbr_type'])[:27]:<28}  "
            f"${row['cost_per_claim']:>10,.2f}  {row['ratio_to_specialty_avg']:>7.2f}x"
        )


# ---------------------------------------------------------------------------
# 5. Brand vs. generic opioid prescribing rate by state
# ---------------------------------------------------------------------------

def brand_vs_generic_by_state(df: pd.DataFrame) -> None:
    """
    Approximate brand vs. generic split: rows where brnd_name != gnrc_name
    (case-insensitive) are treated as brand prescriptions.
    """
    df = df.copy()
    df["is_brand"] = (
        df["brnd_name"].str.strip().str.upper() !=
        df["gnrc_name"].str.strip().str.upper()
    )

    state_split = (
        df.groupby(["prscrbr_state_abrvtn", "is_brand"])["tot_clms"]
        .sum()
        .unstack(fill_value=0)
        .rename(columns={False: "generic_claims", True: "brand_claims"})
    )
    state_split["brand_rate"] = (
        state_split["brand_claims"] /
        (state_split["brand_claims"] + state_split["generic_claims"])
    )
    state_split = state_split.sort_values("brand_rate", ascending=False).head(15)

    print("\n=== Brand vs. Generic Opioid Prescribing Rate by State (top 15 by brand rate) ===")
    print(f"  {'State':<6}  {'Brand Claims':>13}  {'Generic Claims':>15}  {'Brand Rate':>11}")
    print("  " + "-" * 50)
    for state, row in state_split.iterrows():
        print(
            f"  {state:<6}  {row['brand_claims']:>13,.0f}  {row['generic_claims']:>15,.0f}  "
            f"{row['brand_rate']:>10.1%}"
        )


# ---------------------------------------------------------------------------
# 6. Year-over-year change in opioid prescribing 2019-2022 (COVID effect)
#    Requires fetching multiple annual dataset IDs from data.cms.gov
# ---------------------------------------------------------------------------

# Dataset IDs by year (verify against data.cms.gov catalog; IDs may differ)
ANNUAL_DATASET_IDS = {
    2019: "3z4d-vmhm",
    2020: "4c7x-tq33",
    2021: "s8h2-7gks",
    2022: "9n45-2f87",
}


def fetch_national_opioid_totals(year: int, dataset_id: str) -> dict:
    """Fetch aggregate national opioid claims and cost for one program year."""
    url = f"{SOCRATA_BASE}/{dataset_id}.csv"
    frames = []
    for drug in OPIOID_FILTERS:
        params = {
            "$where": f"LOWER(gnrc_name) LIKE '%{drug}%'",
            "$select": "tot_clms,tot_drug_cst",
            "$limit": 500000,
        }
        resp = requests.get(url, params=params, timeout=180)
        resp.raise_for_status()
        df = pd.read_csv(io.StringIO(resp.text), dtype=str, low_memory=False)
        df["tot_clms"]    = pd.to_numeric(df["tot_clms"], errors="coerce")
        df["tot_drug_cst"] = pd.to_numeric(df["tot_drug_cst"], errors="coerce")
        frames.append(df)
    combined = pd.concat(frames, ignore_index=True)
    return {
        "year":        year,
        "total_claims": combined["tot_clms"].sum(),
        "total_cost":   combined["tot_drug_cst"].sum(),
    }


def yoy_opioid_trend() -> None:
    rows = []
    for year, dataset_id in ANNUAL_DATASET_IDS.items():
        print(f"  Fetching {year}...")
        rows.append(fetch_national_opioid_totals(year, dataset_id))

    trend = pd.DataFrame(rows).sort_values("year")
    trend["claims_yoy_pct"] = trend["total_claims"].pct_change() * 100
    trend["cost_yoy_pct"]   = trend["total_cost"].pct_change() * 100

    print("\n=== National Medicare Part D Opioid Prescribing Trend 2019-2022 ===")
    print(f"  {'Year':<6}  {'Total Claims':>13}  {'YoY %':>8}  {'Total Cost ($)':>15}  {'YoY %':>8}")
    print("  " + "-" * 57)
    for _, row in trend.iterrows():
        yoy_c = f"{row['claims_yoy_pct']:+.1f}%" if pd.notna(row['claims_yoy_pct']) else "  --"
        yoy_k = f"{row['cost_yoy_pct']:+.1f}%"  if pd.notna(row['cost_yoy_pct'])   else "  --"
        print(
            f"  {int(row['year']):<6}  {row['total_claims']:>13,.0f}  {yoy_c:>8}  "
            f"${row['total_cost']:>14,.0f}  {yoy_k:>8}"
        )


# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------

def main() -> None:
    print("Fetching CMS Part D opioid prescriber data (2022)...")
    df = fetch_opioid_rows()
    df = to_numeric_cols(df)

    opioid_claims_by_state(df)
    top_opioid_prescribers(df)
    cost_per_claim_vs_specialty(df)
    brand_vs_generic_by_state(df)

    print("\nFetching year-over-year trend (2019-2022)...")
    yoy_opioid_trend()


if __name__ == "__main__":
    main()

The Socrata LIKE filter applied server-side against gnrc_nameis the correct approach for drug-name filtering at this dataset scale: pulling the full 25 million-row annual file and filtering locally would require substantial memory and bandwidth. The deduplication step after concatenation removes rows that match multiple opioid filters (e.g., a drug whose generic name contains both “oxycodone” and another matched string). The brand-versus-generic split treats rows where brnd_name differs from gnrc_nameas brand prescriptions, which approximates the formulary-tier classification without requiring access to plan formulary files.

The year-over-year trend function fetches each program year separately using its Socrata dataset ID. These IDs change with each annual release; the values in the script are correct as of the 2022 release cycle and should be verified against the data.cms.gov catalog before use. The COVID-19 effect on opioid prescribing—a documented shift in 2020 reflecting both changes in healthcare utilization and the acceleration of telehealth-based prescribing during pandemic conditions—is detectable in the year-over-year claim volume comparison.