Medicare Part D Drug Spending Data: The Federal Database Behind $225 Billion in Annual Prescription Drug Costs

Every year the Centers for Medicare & Medicaid Services publishes two datasets that together expose the full architecture of prescription drug spending for America's largest public insurer: which physicians prescribed which drugs to which patients, and how much Medicare paid for every brand and generic in its formulary system. Across roughly 50 million enrolled beneficiaries and more than $225 billion in annual drug costs, the Part D data is the most granular window into pharmaceutical economics available from any public source—and it has driven federal investigations, Pulitzer Prize–winning journalism, and landmark drug pricing legislation.

Origins: the Medicare Prescription Drug, Improvement, and Modernization Act

Medicare was created by the Social Security Amendments of 1965 to provide hospital insurance (Part A) and supplemental medical insurance (Part B) for Americans aged 65 and older. For nearly four decades, outpatient prescription drugs were not covered—a gap that grew increasingly consequential as pharmaceutical treatment displaced surgery and hospitalization for conditions ranging from hypertension to HIV. By the late 1990s, roughly one-third of Medicare beneficiaries had no drug coverage of any kind, and one in four reported skipping prescriptions because of cost.

The Medicare Prescription Drug, Improvement, and Modernization Act of 2003 (MMA) created Part D, a voluntary outpatient prescription drug benefit administered not by the federal government directly but through private insurance plans. Congress deliberately structured Part D as a private-market program: instead of creating a federal drug benefit, CMS certifies and subsidizes private Prescription Drug Plans (PDPs) and Medicare Advantage plans with integrated drug coverage (MA-PD plans). Plans compete on premium, formulary design, and pharmacy network. Beneficiaries choose among plans in their region; CMS provides a federal subsidy that covers the majority of expected costs, calibrated through a sealed-bid process that CMS evaluates annually.

Part D took effect January 1, 2006. Within six months, more than 22 million beneficiaries had enrolled. As of 2023, approximately 50 million Medicare beneficiaries are enrolled in Part D either through a standalone PDP or an MA-PD plan. Total Part D drug spending reached roughly $225–230 billion in 2022, with the federal subsidy accounting for approximately $100 billion of that total. The remainder is covered through plan premiums, manufacturer rebates passed through to plan bids, and beneficiary cost-sharing.

Benefit structure and the coverage phases

The Part D benefit is structured in phases that determine what the beneficiary pays at each level of annual drug spending. The standard benefit structure for 2024 illustrates the design:

Deductible phase: The beneficiary pays 100% of drug costs until spending reaches the annual deductible, set at $545 in 2024. Plans may waive or reduce the deductible for certain tiers (typically generic drugs on Tier 1 and Tier 2).
Initial coverage phase: After the deductible, the beneficiary pays 25% coinsurance (with the plan paying 75%) until total drug costs reach the initial coverage limit (ICL). In 2024 the ICL is $5,030 in total drug spending.
Coverage gap: The original benefit design created a “donut hole”—a gap between the ICL and the catastrophic threshold where beneficiaries historically paid 100% of drug costs. The Affordable Care Act of 2010 phased out the donut hole over ten years through manufacturer discounts and government subsidies. By 2020 the gap was effectively closed: brand-name drug manufacturers pay a 70% discount on their drugs for beneficiaries in the gap, and the government subsidizes 75% of generic costs in the gap, leaving beneficiaries paying only 25% throughout—the same coinsurance rate as the initial coverage phase.
Catastrophic coverage phase: Once a beneficiary's true out-of-pocket spending reaches the catastrophic threshold ($8,000 in 2024), Medicare pays nearly all remaining costs. Prior to the Inflation Reduction Act of 2022, beneficiaries in catastrophic coverage paid 5% coinsurance with no cap, creating exposure of thousands of dollars for high-cost specialty drug users. The IRA eliminated this 5% coinsurance beginning in 2024 and imposed a hard $2,000 annual out-of-pocket cap starting in 2025—the first OOP cap in Part D history.

The benefit structure matters for data interpretation because CMS reports gross drug spending (the total cost of the drug before any rebates), the amount paid by the plan, the amount paid by the government low-income subsidy (LIS), and the amount paid by the beneficiary. The “total drug cost” figure in the Part D prescriber data is gross cost—it includes the manufacturer's list price before plan rebates are applied. Since rebates flow back to plans and CMS after the point of sale and are not reflected in gross claims data, the total drug cost in the PUF systematically overstates what plans and the government actually net out of pocket by the full magnitude of the rebate stack.

CMS Part D Prescriber Public Use File

Beginning with 2013 data, CMS has published the Part D Prescriber Public Use File (PUF) annually. The dataset is structured at the prescriber–drug level: each row represents a unique combination of an individual prescriber (identified by NPI) and a specific drug (identified by generic name and brand name). The fields include:

NPI: the 10-digit National Provider Identifier for the prescribing physician or other eligible prescriber
Prescriber last name / organization name and first name
Prescriber city, state, ZIP code
Prescriber type: the specialty designation from Medicare enrollment data (e.g., “Internal Medicine,” “Family Practice,” “Psychiatry,” “Pain Management”)
Drug name: both brand name and generic (INN) name
Total beneficiaries: unique Medicare Part D beneficiaries who received that prescriber's prescription for that drug during the year
Total claims: total number of Part D prescription fills (each fill is one claim; a 90-day supply counts as one claim)
Total 30-day standardized fills: claims converted to 30-day equivalent units, enabling apples-to-apples comparison across fill durations
Total day supply: aggregate days of drug supply dispensed
Total drug cost: total gross cost of all fills at the point of sale, before plan rebates
Brand/generic flag
Opioid, extended-release opioid, and antibiotic flags: binary indicators added by CMS to support policy analysis

CMS applies a privacy suppression rule: any prescriber–drug combination with fewer than 11 claims is suppressed entirely. The intent is to prevent re-identification of individual beneficiaries in small patient populations. The practical effect is that low-volume prescriptions—particularly for rare diseases or orphan drugs—are undercounted or absent. For high-volume drugs in common specialties, suppression affects a negligible fraction of records.

The prescriber PUF is available at data.cms.gov via bulk CSV download and through the Socrata API. A separate dataset—the Part D Prescribers by Provider file—aggregates to the prescriber level across all drugs, providing total Medicare Part D claims, total beneficiaries, and total drug cost per NPI without the drug-level detail. A third dataset, the Part D Prescribers by Geography file, aggregates to the state level for high-level trend analysis.

Drug-level spending: what Medicare pays for each drug

The Part D Drug Spending Dashboard and the associated Part D Spending by Drug dataset publish aggregate figures at the drug level: for each drug, annual total claims, unique beneficiaries served, total gross spending, average spending per claim, and average spending per beneficiary. These figures cover all Part D plans nationally and represent the most comprehensive public view of pharmaceutical spending in the US.

The top drugs by total Part D spending in recent years illustrate the concentration of pharmaceutical expenditure and the dominance of specialty drugs:

Eliquis (apixaban): approximately $14–16 billion in annual Part D gross spending, making it consistently the largest single drug expenditure in the program. Apixaban is a direct oral anticoagulant (DOAC) used for atrial fibrillation stroke prevention and DVT/PE treatment. It replaced warfarin in most guideline-directed therapy over the 2015–2020 period. Its selection as one of the first 10 drugs subject to IRA price negotiation reflects its exceptional cost burden on the program.
Humira (adalimumab): approximately $6 billion in Part D before the 2023 biosimilar launches, though Humira's primary market is commercial insurance; Part D covers a minority of Humira users. Adalimumab is a TNF-alpha inhibitor used for rheumatoid arthritis, psoriasis, Crohn's disease, and other inflammatory conditions. Its biosimilar transition beginning in January 2023 is a major natural experiment in Part D formulary competition dynamics.
Keytruda (pembrolizumab): approximately $5 billion in Part D and growing rapidly as pembrolizumab's oncology indications have expanded to cover more than 20 cancer types. Part B rather than Part D covers many pembrolizumab administrations when given by infusion in a physician office; Part D captures the oral oncology route and some outpatient dispensing.
Xarelto (rivaroxaban): approximately $5 billion, another DOAC in the same anticoagulation market as Eliquis. Also included in the first IRA negotiation cohort.
Ozempic (semaglutide) and Victoza/Saxenda (liraglutide): GLP-1 receptor agonists originally indicated for type 2 diabetes management and cardiovascular risk reduction. Spending for GLP-1 agonists has grown faster than any other drug class in recent years, driven by expanded indications and off-label use for weight loss. CMS data shows total GLP-1 spending roughly doubling year-over-year in 2022–2024, placing them among the top-five expenditure categories despite their relatively recent Medicare formulary inclusion.
Revlimid (lenalidomide): oral immunomodulatory agent used for multiple myeloma and myelodysplastic syndromes. Average wholesale price exceeded $800 per pill before generic entry in 2022; annual treatment cost could exceed $200,000. One of the clearest examples of the specialty drug spending concentration problem.

Specialty drugs—broadly defined as drugs requiring special handling, administration, or monitoring, typically with list prices exceeding $600 per month—now account for approximately 50% of total Part D gross drug spending despite representing fewer than 1% of total claims volume. The top 1% of Part D beneficiaries by drug spending account for roughly 25% of total gross program cost.

Plan design: formularies, tiers, and PBM rebates

Part D plans must cover at least two drugs in each therapeutic category, with certain protected classes (immunosuppressants for transplant recipients, antiretrovirals, antidepressants, antipsychotics, anticonvulsants, and antineoplastics) requiring coverage of essentially all FDA-approved drugs. Within those constraints, plans have substantial latitude in formulary design and cost-sharing structure.

The typical Part D formulary uses five tiers:

Tier 1 — Preferred generic: lowest copay, typically $0–$5
Tier 2 — Non-preferred generic or preferred brand: moderate copay, typically $10–$30
Tier 3 — Non-preferred brand: higher copay, typically $40–$60
Tier 4 — Non-preferred brand or high-cost generic: coinsurance rather than flat copay, typically 25–40%
Tier 5 — Specialty: coinsurance of 25–33%, with minimum fills typically exceeding $150 per month and often reaching thousands of dollars for the highest-cost biologics before the beneficiary reaches catastrophic coverage

Plans also use utilization management tools—prior authorization (PA), step therapy (requiring a patient to try lower-cost alternatives before a preferred drug is covered), and quantity limits—to control spending on high-cost and high-utilization drugs. PA and step therapy requirements are visible in plan formulary data published on the CMS Plan Finder, though not in the prescriber PUF.

The rebate system fundamentally complicates interpretation of the Part D cost data. Pharmacy Benefit Managers—the three dominant PBMs are CVS Caremark (owned by CVS Health), Express Scripts (owned by Cigna), and OptumRx (owned by UnitedHealth Group)—negotiate manufacturer rebates on behalf of the plans they administer. A drug manufacturer seeking preferred formulary placement (Tier 1 or Tier 2) typically pays the PBM a rebate calibrated as a percentage of the drug's list price. For blockbuster brand-name drugs with significant therapeutic competition, rebates can exceed 60–80% of list price.

The insulin pricing scandal illustrates the gap between list and net prices. Insulin analogs from Eli Lilly, Novo Nordisk, and Sanofi carried list prices exceeding $300 per vial by 2020. Manufacturer rebates to PBMs and plans often exceeded 80% of that list price, meaning the net cost to plans and CMS was below $60 per vial. But beneficiaries—particularly those in the deductible phase or with high-deductible plans—paid coinsurance based on the list price, not the net price. A diabetic beneficiary with a $545 deductible and a $300-per-vial insulin list price paid the full $300 per vial until reaching the deductible, despite the plan's actual cost being a fraction of that amount. The IRA's $35-per-month insulin copay cap for Medicare (effective 2023) directly addressed this structural mismatch.

Inflation Reduction Act 2022 drug pricing reforms

The Inflation Reduction Act, signed August 16, 2022, enacted the most substantial changes to Medicare drug pricing since Part D's creation in 2003. The IRA contains four major drug pricing provisions:

Medicare drug price negotiation

Section 11001 of the IRA authorizes CMS to negotiate a “maximum fair price” (MFP) directly with manufacturers for high-expenditure Medicare drugs that lack generic or biosimilar competition. This overrides the MMA's explicit prohibition on CMS negotiation, which had been a defining feature of the Part D program for 17 years.

CMS announced the first 10 drugs selected for negotiation in August 2023:

Eliquis (apixaban) — Bristol-Myers Squibb / Pfizer
Jardiance (empagliflozin) — Boehringer Ingelheim / Eli Lilly
Xarelto (rivaroxaban) — Janssen / Bayer
Januvia (sitagliptin) — Merck
Farxiga (dapagliflozin) — AstraZeneca
Entresto (sacubitril-valsartan) — Novartis
Enbrel (etanercept) — Amgen
Imbruvica (ibrutinib) — AbbVie / J&J
Stelara (ustekinumab) — J&J
Fiasp / NovoLog (insulin aspart) — Novo Nordisk

The negotiated maximum fair prices are effective January 1, 2026. CMS published the negotiated prices in August 2024, with reductions ranging from approximately 38% below list price (Jardiance) to 79% below list price (Fiasp/NovoLog). The IRA requires CMS to negotiate prices for 15 additional drugs in 2027, 15 more in 2028, and 20 per year thereafter, with negotiated prices phasing down over time based on a statutory formula tied to years post-exclusivity and the extent to which the drug lacks competition.

The pharmaceutical industry responded with extensive litigation. Bristol-Myers Squibb, Merck, Johnson & Johnson, and the Chamber of Commerce each filed lawsuits in federal court challenging the IRA's negotiation program as an unconstitutional taking (Fifth Amendment), a violation of the First Amendment (compelled speech), and a violation of the non-delegation doctrine. As of mid-2025, the cases were proceeding through appellate review with no final resolution; negotiated prices remained in effect pending litigation outcomes.

Inflation rebates

Separately from the negotiation program, the IRA requires manufacturers to pay CMS a rebate whenever a drug's Medicare Part D or Part B price increases faster than the rate of general inflation (CPI-U) in any year after 2021. Manufacturers exceeding the inflation benchmark owe CMS the difference between their actual price increase and the CPI-U rate, applied to the full volume of units sold to Medicare. This provision directly addresses the systematic pattern—documented extensively in CMS and IQVIA data—of manufacturers taking list price increases of 5–15% annually for established drugs facing no new competition, while rebates kept net prices roughly flat.

Out-of-pocket cap and Low-Income Subsidy expansion

The IRA's $2,000 annual out-of-pocket cap (effective 2025) is projected by CBO to benefit approximately 1.4 million Part D enrollees who would otherwise exceed $2,000 in annual spending under prior law. High-cost specialty drug users—cancer patients on oral chemotherapy, transplant recipients, multiple myeloma patients on lenalidomide or pomalidomide—are the primary beneficiaries. The IRA also expanded eligibility for the Low-Income Subsidy (Extra Help) program beginning 2024, eliminating the “partial LIS” tier and bringing all LIS-eligible beneficiaries to full subsidy status (no deductible, nominal copays).

Low-Income Subsidy and dual eligibles

Approximately 13 million Part D enrollees receive the Low-Income Subsidy (LIS), also called Extra Help. Full LIS beneficiaries pay no Part D premium (up to the benchmark premium in their region), no deductible, and nominal copays ($1–$10 depending on drug type and income level). The federal government funds the full cost-sharing subsidy through appropriations separate from the Part D premium subsidy.

Dual-eligible beneficiaries—those enrolled in both Medicare and Medicaid, approximately 12 million individuals—are automatically enrolled in a Part D benchmark plan and receive full LIS. The transition of dual eligibles from Medicaid drug coverage to Part D in January 2006 created the “clawback” mechanism: state Medicaid programs pay CMS a monthly per-beneficiary contribution (the “phased-down state contribution”) representing their estimated savings from no longer covering dual-eligible drug costs. The clawback amounts are calculated from 2003 baseline Medicaid drug expenditures and have been a persistent source of intergovernmental fiscal tension.

LIS beneficiaries generate distinct patterns in the Part D prescriber data. Because they pay minimal cost-sharing at the point of sale, they have fewer barriers to adherence and tend to fill prescriptions more consistently. LIS enrollees account for a disproportionate share of total Part D claims volume relative to their enrollment share, reflecting both their generally poorer health status and their near-zero cost-sharing barriers.

Journalism and investigative research using Part D data

The Part D prescriber data has been used by journalists and academic researchers to produce investigations that would have been impossible with any other public source. The opioid epidemic generated the most prominent applications.

ProPublica's Prescriber Checkup tool, launched with the first release of 2013 prescriber data, allowed any user to search a specific physician's prescribing patterns, compare them against specialty peers, and identify outliers. ProPublica's analysis found that a small fraction of prescribers—often in pain management practices, primary care settings in rural Appalachia and Kentucky, and clinics that had already attracted DEA scrutiny—accounted for a disproportionate share of opioid claims nationally.

The Los Angeles Times used Part D data extensively in its 2016 investigation into opioid prescribing in California, identifying physicians who prescribed quantities of opioids inconsistent with their specialty or patient panel and correlating those prescribing patterns with state medical board disciplinary actions and DEA registrant data. STAT News built a drug spending database combining Part D PUF data with FDA label approvals and IQVIA net price estimates to publish annual analyses of the gap between gross Part D spending and estimated net spending after rebates.

Academic researchers have used Part D data to study biosimilar adoption patterns, formulary access for oncology drugs, prescribing variation by physician specialty and geography, and the effect of prior authorization requirements on medication adherence in specific conditions. The data is available without restriction and without IRB requirements for aggregate research (individual beneficiary data requires a CMS data use agreement), making it one of the most accessible large-scale datasets in health services research.

Biosimilar adoption and the Humira market experiment

Biologic drugs—large-molecule therapies produced through living cell culture rather than chemical synthesis—account for the majority of specialty drug spending and have been largely insulated from generic competition because of the complexity of their manufacturing and the regulatory pathway required for biosimilar approval. The Biologics Price Competition and Innovation Act (BPCIA) of 2009 created an abbreviated pathway for biosimilar approval by the FDA, analogous to the Hatch-Waxman pathway for small-molecule generics, but biosimilar market penetration in the US has been far slower than in Europe.

Humira (adalimumab) became the pivotal test case. AbbVie's strategy of building a “patent thicket”—filing more than 250 patents related to adalimumab formulation, manufacturing, and indications, then settling with biosimilar manufacturers to delay US entry until January 2023 (while biosimilars launched in Europe in 2018)—is widely cited as the archetype of reference product life cycle management. When the US exclusivity settlement dates arrived in January 2023, seven adalimumab biosimilars launched simultaneously.

Part D formulary behavior in the year following biosimilar launch is directly visible in CMS prescriber and drug spending data. Plans faced a strategic choice: place biosimilar adalimumab on a preferred tier and subject reference Humira to non-preferred status or specialty tier (which typically requires PA and produces formulary pressure toward the biosimilar), or continue preferring Humira based on higher rebates that AbbVie offered to maintain formulary position. AbbVie responded to biosimilar entry with an aggressive rebate defense strategy, offering plans rebates on Humira that exceeded the net cost advantage of competing biosimilars, effectively making it financially rational for plans to keep Humira preferred despite its higher list price.

CMS Part D data from 2023 and 2024 shows that biosimilar adalimumab market penetration in Medicare was modest in the first year—dominated by plans that chose biosimilar preference, with many plans maintaining Humira preference. The IRA includes provisions designed to alter this dynamic: Medicare plans that prefer a biosimilar over the reference product receive a higher plan-level “biosimilar offset” in their CMS bids, creating a financial incentive for biosimilar formulary preference. Whether these provisions are sufficient to overcome reference product rebate strategies is a live policy and market question that the annual Part D spending data will answer in successive releases.

Interchangeability status is an additional variable. FDA grants “interchangeable” designation to biosimilars that meet additional standards demonstrating safety when substituted for the reference product without prescriber intervention—the equivalent of generic substitution at the pharmacy. State pharmacy laws govern whether pharmacists may substitute an interchangeable biosimilar for a brand prescription. As of 2024, several adalimumab biosimilars had received interchangeable designation, but pharmacy-level substitution rates in Part D are not separately reported in public data; they are inferable only by comparing prescriber-level brand versus biosimilar fill rates in the prescriber PUF.

Python example: opioid prescribing analysis using the Part D PUF

The following script uses the CMS Socrata API to query the Part D Prescribers by Provider and Drug dataset for opioid medications, aggregates total opioid claims to the prescriber level, computes state-level prescribing intensity, and flags statistical outliers within each specialty using z-score analysis. Privacy suppression means that low-volume prescribers are undercounted; the analysis is most reliable for specialties with high opioid prescribing volume (pain management, anesthesiology, family practice, internal medicine). No API key is required.

import requests
import pandas as pd
import numpy as np
import io

# ---------------------------------------------------------------------------
# CMS Medicare Part D Prescriber Public Use File (PUF)
# Source: data.cms.gov Socrata API (no API key required for read access)
# Dataset: "Medicare Part D Prescribers - by Provider and Drug"
# Socrata dataset ID for the most recent year (2022 as of 2025):
# ---------------------------------------------------------------------------

SOCRATA_BASE = "https://data.cms.gov/resource"

# 2022 Part D Prescribers by Provider and Drug
# Dataset ID may change with annual releases; verify at data.cms.gov
DATASET_ID = "9n45-2f87"   # Part D Prescribers by Provider and Drug, 2022

# Opioid drug names commonly appearing in Part D data (partial match list)
OPIOID_GENERICS = [
    "oxycodone", "hydrocodone", "morphine sulfate", "fentanyl",
    "tramadol", "codeine", "oxymorphone", "methadone", "buprenorphine",
    "hydromorphone", "tapentadol", "meperidine",
]

def fetch_opioid_prescribers(limit: int = 200000) -> pd.DataFrame:
    """
    Fetch Part D prescriber-drug rows for opioid drugs.
    We filter server-side using Socrata SoQL; one request per opioid name.
    Returns a single concatenated DataFrame.
    """
    frames = []
    for drug in OPIOID_GENERICS:
        url = f"{SOCRATA_BASE}/{DATASET_ID}.csv"
        params = {
            "$where": f"LOWER(gnrc_name) LIKE '%{drug}%'",
            "$limit": limit,
            "$select": (
                "prscrbr_npi,prscrbr_last_org_name,prscrbr_first_name,"
                "prscrbr_city,prscrbr_state_abrvtn,prscrbr_type,"
                "gnrc_name,brnd_name,tot_clms,tot_benes,tot_day_suply,tot_drug_cst"
            ),
        }
        resp = requests.get(url, params=params, timeout=120)
        resp.raise_for_status()
        df = pd.read_csv(io.StringIO(resp.text), dtype=str, low_memory=False)
        if not df.empty:
            df["query_opioid"] = drug
            frames.append(df)
        print(f"  {drug}: {len(df):,} rows")

    if not frames:
        return pd.DataFrame()
    combined = pd.concat(frames, ignore_index=True)
    print(f"Total opioid rows fetched: {len(combined):,}")
    return combined


def clean_numeric(df: pd.DataFrame) -> pd.DataFrame:
    """Convert claim count, beneficiary, day supply, and cost columns to numeric."""
    for col in ["tot_clms", "tot_benes", "tot_day_suply", "tot_drug_cst"]:
        if col in df.columns:
            df[col] = pd.to_numeric(
                df[col].astype(str).str.replace(r"[$,]", "", regex=True),
                errors="coerce",
            )
    return df


def top_opioid_prescribers(df: pd.DataFrame, top_n: int = 25) -> pd.DataFrame:
    """
    Aggregate across all opioid drugs per prescriber (NPI level).
    Returns top_n by total opioid claims.
    """
    agg = (
        df.groupby(
            ["prscrbr_npi", "prscrbr_last_org_name", "prscrbr_first_name",
             "prscrbr_state_abrvtn", "prscrbr_type"],
            as_index=False,
        )
        .agg(
            total_opioid_claims=("tot_clms", "sum"),
            total_opioid_benes=("tot_benes", "sum"),
            total_opioid_cost=("tot_drug_cst", "sum"),
            distinct_opioids=("gnrc_name", "nunique"),
        )
        .sort_values("total_opioid_claims", ascending=False)
        .head(top_n)
    )
    return agg


def opioid_rate_by_state(df: pd.DataFrame) -> pd.DataFrame:
    """
    Compute opioid claims per Part D beneficiary by state.
    Uses only state-level aggregation; requires separate beneficiary
    denominator (total Part D benes by state) for a true rate.
    Here we proxy with opioid-bene count as a rough intensity measure.
    """
    state = (
        df.groupby("prscrbr_state_abrvtn", as_index=False)
        .agg(
            opioid_claims=("tot_clms", "sum"),
            opioid_benes=("tot_benes", "sum"),
            opioid_cost=("tot_drug_cst", "sum"),
        )
    )
    state["cost_per_claim"] = state["opioid_cost"] / state["opioid_claims"]
    return state.sort_values("opioid_claims", ascending=False)


def outlier_prescribers_by_specialty(df: pd.DataFrame, z_threshold: float = 3.0) -> pd.DataFrame:
    """
    Flag prescribers whose opioid claims per beneficiary are z_threshold
    standard deviations above their specialty peer group.
    Low-volume prescribers (fewer than 50 total opioid claims) are excluded
    to reduce noise from CMS-suppressed partial data.
    """
    agg = (
        df.groupby(
            ["prscrbr_npi", "prscrbr_last_org_name", "prscrbr_first_name",
             "prscrbr_state_abrvtn", "prscrbr_type"],
            as_index=False,
        )
        .agg(
            total_claims=("tot_clms", "sum"),
            total_benes=("tot_benes", "sum"),
        )
    )
    agg = agg[agg["total_claims"] >= 50].copy()
    agg["claims_per_bene"] = agg["total_claims"] / agg["total_benes"].replace(0, np.nan)

    # Compute specialty-level mean and std
    specialty_stats = (
        agg.groupby("prscrbr_type")["claims_per_bene"]
        .agg(["mean", "std"])
        .rename(columns={"mean": "spec_mean", "std": "spec_std"})
        .reset_index()
    )
    agg = agg.merge(specialty_stats, on="prscrbr_type", how="left")
    agg["z_score"] = (agg["claims_per_bene"] - agg["spec_mean"]) / agg["spec_std"].replace(0, np.nan)

    outliers = agg[agg["z_score"] >= z_threshold].sort_values("z_score", ascending=False)
    return outliers


def main() -> None:
    print("Fetching CMS Part D opioid prescriber data...")
    raw = fetch_opioid_prescribers()
    if raw.empty:
        print("No rows returned. Check dataset ID or API availability.")
        return

    raw = clean_numeric(raw)

    # --- Top opioid prescribers nationally ---
    top = top_opioid_prescribers(raw, top_n=25)
    print()
    print("=== Top 25 Opioid Prescribers by Total Claims ===")
    print(f"{'NPI':<12} {'Last Name':<25} {'State':<6} {'Specialty':<30} "
          f"{'Claims':>8} {'Benes':>7} {'Cost':>12}")
    print("-" * 105)
    for _, r in top.iterrows():
        print(
            f"{r['prscrbr_npi']:<12} {str(r['prscrbr_last_org_name'])[:24]:<25} "
            f"{r['prscrbr_state_abrvtn']:<6} {str(r['prscrbr_type'])[:29]:<30} "
            f"{r['total_opioid_claims']:>8,.0f} {r['total_opioid_benes']:>7,.0f} "
            f"${r['total_opioid_cost']:>11,.0f}"
        )

    # --- State-level opioid prescribing intensity ---
    state_tbl = opioid_rate_by_state(raw)
    print()
    print("=== Opioid Prescribing by State (Top 15 by Volume) ===")
    print(f"{'State':<6} {'Claims':>10} {'Benes':>9} {'Total Cost':>13} {'Cost/Claim':>11}")
    print("-" * 54)
    for _, r in state_tbl.head(15).iterrows():
        print(
            f"{r['prscrbr_state_abrvtn']:<6} {r['opioid_claims']:>10,.0f} "
            f"{r['opioid_benes']:>9,.0f} ${r['opioid_cost']:>12,.0f} "
            f"${r['cost_per_claim']:>10,.2f}"
        )

    # --- Outlier prescribers by specialty ---
    outliers = outlier_prescribers_by_specialty(raw, z_threshold=3.0)
    print()
    print(f"=== Outlier Prescribers (z >= 3.0 vs Specialty Peers): {len(outliers):,} flagged ===")
    print(f"{'NPI':<12} {'Last Name':<25} {'State':<6} {'Specialty':<30} "
          f"{'Clms/Bene':>10} {'Z-Score':>8}")
    print("-" * 96)
    for _, r in outliers.head(20).iterrows():
        print(
            f"{r['prscrbr_npi']:<12} {str(r['prscrbr_last_org_name'])[:24]:<25} "
            f"{r['prscrbr_state_abrvtn']:<6} {str(r['prscrbr_type'])[:29]:<30} "
            f"{r['claims_per_bene']:>10.2f} {r['z_score']:>8.2f}"
        )


if __name__ == "__main__":
    main()

The z-score outlier detection in the final function is the computational analog of what ProPublica's Prescriber Checkup performed interactively. A prescriber with a z-score of 5.0 or higher has opioid claims per beneficiary five standard deviations above their specialty mean—a pattern that would be extraordinary in any legitimate clinical context. The specialty stratification matters because raw claim counts are incomparable across specialties: a pain management physician legitimately prescribes far more opioids per patient than an internist, so the relevant comparison is always within-specialty. Joining the NPI output to the NPPES NPI registry (also public, at download.cms.gov/nppes) adds full address, medical school, and license data for follow-on investigation. Joining to CMS Open Payments data identifies financial relationships between high-volume prescribers and pharmaceutical manufacturers—a combination that has anchored multiple conflict-of-interest investigations.

For drug spending analysis, the same Socrata endpoint can be filtered by drug name to pull all prescribers of a specific drug, compute market share by state and specialty, and track how biosimilar launches shift prescribing away from reference products over time. The year-over-year comparison is most informative: the 2022 adalimumab rows in the PUF represent effectively 100% reference Humira; the 2023 and 2024 rows show the biosimilar transition in progress, provider by provider and plan by plan.

For the hospital payment side of Medicare—how CMS pays for inpatient stays by DRG code, the IPPS payment formula, charge-to-payment ratios, and geographic payment variation—see CMS Medicare Inpatient Provider Data: The Hospital-Level Payment Records Behind $170 Billion in Annual DRG Reimbursements, which covers the full IPPS dataset available on data.cms.gov.

For pharmaceutical patent and exclusivity data that governs when generic and biosimilar competition can enter—directly determining which drugs remain high-cost in Part D and which become eligible for price negotiation under the IRA—see FDA Orange Book: The Patent and Exclusivity Database Behind Every Generic Drug Entry Decision, which covers the Orange Book patent listings, exclusivity codes, and the Paragraph IV certification process.