Technical writing

The pharma payment map: joining CMS Open Payments and Medicare Part D prescribing data

· 10 min read· AI Analytics
Healthcare dataCMSOpen PaymentsMedicare Part DPharma

Two public federal datasets — Open Payments and Medicare Part D prescribers — share a common key: the National Provider Identifier. Join them and you get a physician-level map of who received money from which drug manufacturer, and what they prescribed to Medicare patients. This is the methodology behind ProPublica's Dollars for Docs, and it is now reproducible against public APIs with no authentication required.

The Sunshine Act and Open Payments

The Physician Payments Sunshine Act, enacted as part of the Affordable Care Act in 2010, requires every drug and device manufacturer with products covered by Medicare, Medicaid, or CHIP to report every transfer of value given to physicians and teaching hospitals. The reporting obligation is broad: meals, travel, speaking fees, consulting arrangements, education grants, research funding, and equity stakes all qualify. CMS collects these reports and publishes them as the Open Payments dataset annually.

The underlying theory is transparency rather than prohibition. The Sunshine Act does not ban manufacturer-to-physician payments; it makes them visible. The assumption is that public disclosure changes behavior — manufacturers who know their payments will appear in a searchable federal database alongside the prescribing patterns of the physicians they pay have an incentive to be more circumspect, and journalists, researchers, and patients have the raw material to evaluate whether those payments correlate with prescribing decisions.

Open Payments data structure

CMS publishes Open Payments in three categories. Each is a separate download and a separate API endpoint, but they share a common schema for the fields that matter most to a join analysis.

General Payments

The bulk of the dataset: meals, travel, speaking fees, consulting arrangements, education grants, and any other non-research transfer of value. Approximately 13–15 million records per program year. This is the category where the payment-to-prescribing signal is strongest, because speaking fees and consulting arrangements are the mechanisms most directly tied to product-specific promotion.

Research Payments

Clinical trial payments and research-related transfers: investigator fees, IRB costs, overhead, and subcontractor payments routed through teaching hospitals. Approximately 750,000 records per year. Research payments require additional fields: the associated clinical trial NCT number where applicable, and whether the payment is to an individual physician or to an institution on behalf of one.

Ownership Interests

Equity stakes, investment interests, and stock returns. Approximately 400,000 records per year. These are structurally different from payments — they represent ongoing financial relationships rather than discrete transactions — and require separate handling in any longitudinal analysis.

The dataset covers program years 2013 through the most recent published year (currently 2023, with 2024 expected in mid-2025), available at openpaymentsdata.cms.gov and via the public API at no cost. Approximately 100 million total records across all years and categories. The hub indexes the full corpus:

curl https://api.ai-analytics.org/datasets/cms-open-payments

Key fields for the join

The fields that matter for a prescribing-correlation analysis are:

  • Covered_Recipient_NPI — the physician's National Provider Identifier. This is the join key to Part D prescribers. Not all records have an NPI populated; about 8–10% of general payments go to physicians whose NPI CMS could not resolve, typically due to name-matching failures at submission time.
  • Submitting_Applicable_Manufacturer_or_Applicable_GPO_Name — the reporting manufacturer. This is the entity making the payment, not necessarily the entity whose product is being promoted; large manufacturers report under a parent entity with subsidiary products.
  • Total_Amount_of_Payment_USDollars — the dollar value of the transfer. For meals this may be $23; for speaking engagements it commonly runs $1,000–$5,000 per event; for named investigators on large trials it can reach six figures annually.
  • Nature_of_Payment_or_Transfer_of_Value — a CMS-defined controlled vocabulary: Food and Beverage, Consulting Fee, Honoraria, Speaker at a Venue, Travel and Lodging, Education, Grant, and several others. This field is the primary filter for isolating the promotional payment categories from research funding.
  • Name_of_Drug_or_Biological_or_Device_or_Medical_Supply — present only when the payment is associated with a specific product. Speaking fees for a manufacturer's branded drug will typically have this field populated; unrestricted grants will not. This is the field that closes the loop between the payment and the specific product prescribed.
  • Date_of_Payment — the date of the individual transfer. For recurring arrangements like speaker bureau enrollment, this is the date of each individual event, not the enrollment date.

Medicare Part D prescribers

The Medicare Part D Prescribers by Provider and Drug dataset is published separately by CMS and covers which physicians (identified by NPI) prescribed which drugs to Medicare Part D beneficiaries, in what quantities, and at what cost to the program. Coverage runs from 2013 through 2022 at data.cms.gov; 2023 data is expected on the same annual publication cycle.

curl https://api.ai-analytics.org/datasets/cms-part-d-prescribers

Each row in the Part D dataset is a provider-drug combination for a given year. The key fields are:

  • Rendering_NPI — the prescribing physician's NPI. This is the join key to Open Payments.
  • drug_name — the drug name as it appears on the Part D claim. CMS standardizes these to generic names in recent years; older data uses brand names inconsistently, requiring a brand-to-generic crosswalk for accurate matching against the Open Payments drug field.
  • tot_clms — total claims submitted for this provider-drug combination in the year.
  • tot_drug_cst — total cost to Medicare for those claims, in dollars. This is the program cost figure, not patient out-of-pocket cost.
  • tot_benes — number of distinct beneficiaries. CMS suppresses rows where this value is fewer than 11, to protect patient privacy. Roughly 15–20% of provider-drug rows are suppressed in any given year.

The dataset runs to tens of millions of provider-drug rows per year across all Part D prescribers. The public API at data.cms.gov returns paginated JSON with no authentication.

The join

The NPI number appears in both datasets. Covered_Recipient_NPI in Open Payments and Rendering_NPI in Part D prescribers are both CMS-assigned National Provider Identifiers from the same NPPES registry. The join is direct:

SELECT
  p.Rendering_NPI,
  p.drug_name,
  SUM(p.tot_drug_cst)                       AS medicare_cost,
  SUM(o.Total_Amount_of_Payment_USDollars)  AS payments_received
FROM part_d_prescribers p
JOIN open_payments_general o
  ON p.Rendering_NPI = o.Covered_Recipient_NPI
WHERE o.Name_of_Drug_or_Biological = p.drug_name
GROUP BY p.Rendering_NPI, p.drug_name
HAVING payments_received > 10000
ORDER BY payments_received DESC;

The WHERE clause filtering on drug name is the critical step. Without it, the join attributes all of a manufacturer's payments to all of the physician's prescriptions, inflating the signal for physicians who prescribe broadly. With it, you isolate the rows where the manufacturer paid the physician specifically for the drug they then prescribed. About 40% of General Payment records have a product name populated; the remainder are unrestricted payments where product specificity is not required by the reporting rules.

The NPI bridge for older Open Payments records

Prior to program year 2016, some Open Payments records identify physicians byPhysician_Profile_ID rather than NPI directly. CMS publishes a physician profile supplement file that maps Profile IDs to NPIs; this file must be joined in as a preprocessing step before the main query. From 2016 onward, Covered_Recipient_NPIis reliably populated for records where CMS could resolve the identity.

-- Preprocessing step for pre-2016 data
UPDATE open_payments_general og
SET Covered_Recipient_NPI = pp.npi
FROM physician_profiles pp
WHERE og.Covered_Recipient_NPI IS NULL
  AND og.Physician_Profile_ID = pp.profile_id;

What the join reveals

This methodology is the engine behind ProPublica's Dollars for Docs project, which first published in 2010 and has run continuous updates as Open Payments data became available. The pattern it surfaces: a physician receives a substantial speaking fee or consulting payment from Manufacturer B for Drug X; in the same period, that physician prescribes Drug X to Medicare patients at two to four times the rate of demographically comparable physicians in the same specialty and geography.

A specific example representative of the literature: a pain management specialist received $85,000 in speaking fees from an opioid manufacturer over three years. In the same period, that physician prescribed the manufacturer's branded opioid formulation at 3.2 times the rate of comparable pain specialists in their state. Medicare paid $2.1 million for those prescriptions. The payment-to-prescribing ratio is 1:24 — for every dollar paid, $24 in Medicare spend followed. The correlation does not prove that the payments caused the prescribing. But it is the signal.

Drug categories where the signal is strongest

Academic research using this methodology has consistently found the payment-to-prescribing correlation highest in specific drug categories:

  • Biologics and specialty drugs. High-cost, manufacturer-dependent products where speakers bureaus are the primary physician education channel. Per-unit margins justify large speaking fees; the Medicare cost per claim is high enough that even modest prescribing volume produces large program spend. TNF inhibitors, IL-17 inhibitors, and oncology biologics show consistent correlation in published analyses.
  • Branded statins versus generic alternatives. After atorvastatin went off patent in 2011, manufacturers of remaining branded statins maintained speakers bureaus promoting brand continuation over generic substitution. The Open Payments + Part D join shows elevated branded statin prescribing among physicians receiving those speaking fees even after bioequivalent generics were available at a fraction of the cost.
  • Opioids (pre-DEA enforcement). The most politically significant application of the methodology. Opioid manufacturers ran extensive speakers bureaus through the 2010s, promoting extended-release formulations. The join shows sharp correlation between speaking fee receipt and branded opioid prescribing volume in the 2013–2018 window, before DEA enforcement actions and state attorney general litigation restructured the market.
  • Antidiabetic agents. The SGLT-2 inhibitor and GLP-1 agonist categories have produced some of the clearest recent payment-to-prescribing signals, with manufacturers competing aggressively for prescriber mindshare in a crowded market of similarly-efficacious branded alternatives.

Cross-reference with LEIE exclusions

The HHS OIG List of Excluded Individuals and Entities (LEIE) is already in the Federal Regulatory Data Hub. Exclusion from LEIE means the individual or entity is prohibited from billing Medicare, Medicaid, and all other federal health programs. An excluded provider cannot legally submit Part D prescriptions that Medicare will reimburse.

The three-way join — Open Payments, Part D prescribers, and LEIE — surfaces a compliance edge case that does not appear in any single dataset alone: providers who are excluded from Medicare billing but who appear in Open Payments as continuing to receive manufacturer payments. Manufacturers are not prohibited from paying excluded providers; they are prohibited from promoting their products through them in ways that induce Medicare prescribing. Whether a speaking fee paid to an excluded provider crosses that line is a fact-specific legal question, but the appearance of the combination is a material compliance flag.

-- Three-way join: payment recipient who is LEIE-excluded
SELECT
  o.Covered_Recipient_NPI,
  o.Covered_Recipient_First_Name,
  o.Covered_Recipient_Last_Name,
  SUM(o.Total_Amount_of_Payment_USDollars) AS total_payments,
  l.EXCLTYPE,
  l.EXCLDATE
FROM open_payments_general o
JOIN leie_exclusions l
  ON o.Covered_Recipient_NPI = l.NPI
WHERE l.REINDATE IS NULL   -- still excluded, not reinstated
GROUP BY
  o.Covered_Recipient_NPI,
  o.Covered_Recipient_First_Name,
  o.Covered_Recipient_Last_Name,
  l.EXCLTYPE,
  l.EXCLDATE
HAVING total_payments > 0
ORDER BY total_payments DESC;

In the 2018–2023 data, this query returns several hundred NPI matches per year — providers who are currently excluded yet appear as payment recipients in the same period. The majority are small amounts (meal reimbursements where the manufacturer may not have screened against LEIE at the time of the event), but a subset involve substantial consulting or speaking arrangements that represent clearer compliance exposure for the manufacturer.

Data quality notes

A few practical issues affect any serious analysis of this join:

  • Drug name normalization. The drug name field in Open Payments is manufacturer-supplied free text. The drug_name field in Part D is CMS- standardized but has changed conventions across years. A brand-to-generic crosswalk (RxNorm is the standard reference) is necessary for accurate matching, especially for biologics where biosimilar naming conventions complicate exact string matching.
  • NPI suppression in Part D. Rows with fewer than 11 beneficiaries are suppressed. This removes a significant fraction of low-volume prescribers — exactly the providers where unusual prescribing patterns (very high volume of a single branded drug) might be most informative. The suppression is a deliberate privacy protection and cannot be worked around.
  • Temporal alignment. The payment and prescribing datasets must be aligned by year carefully. Open Payments reports the date of the individual payment; Part D reports annual aggregate prescribing. A speaking fee paid in December of year N may influence prescribing that only becomes visible in the year N+1 aggregate. Leading and lagging the payment data by one year is standard practice in the literature.
  • Teaching hospital payments. Open Payments includes payments to teaching hospitals as institutions, not just to individual physicians. These appear with a hospital NPI rather than an individual physician NPI and do not join meaningfully to the Part D prescriber dataset, which is individual-provider-level. Filter them out with WHERE Covered_Recipient_Type = 'Covered Recipient Physician'.

Accessing the data

Both datasets are available through the Federal Regulatory Data Hub without authentication. The hub normalizes the drug name field, pre-builds the physician profile NPI bridge for pre-2016 records, and indexes the LEIE exclusion data in the same query layer so the three-way join runs without coordinating three separate government APIs.

# Open Payments general payments, by NPI
curl "https://api.ai-analytics.org/datasets/cms-open-payments?npi=1234567890&year=2022"

# Part D prescribers, by NPI
curl "https://api.ai-analytics.org/datasets/cms-part-d-prescribers?npi=1234567890&year=2022"

# Pre-built payment-to-prescribing join for a single provider
curl "https://api.ai-analytics.org/datasets/cms-open-payments/prescribing-correlation?npi=1234567890"

# LEIE cross-reference: is this NPI currently excluded?
curl "https://api.ai-analytics.org/api/v1/screening/all?npi=1234567890"

The direct dataset endpoints at CMS for independent access:

  • Open Payments public API: https://openpaymentsdata.cms.gov/api/1/ — Socrata SODA protocol, no authentication, 50,000-record page limit, rate-limited.
  • Part D prescribers: https://data.cms.gov/provider-summary-by-type-of-service/medicare-part-d-prescribers — same Socrata stack. Bulk CSV downloads are faster than API pagination for full-corpus pulls.

For the compliance risk scoring methodology that incorporates LEIE exclusions alongside OFAC, SAM, and 28 other enforcement lists: Compliance screening across 30+ federal enforcement lists: how the risk score works →

For the HHS OIG exclusions dataset ingest, LEIE field schema, and the reinstatement lifecycle that determines whether an exclusion is still active: HHS OIG LEIE: ingesting and querying 76,000 Medicare exclusions →