Technical writing

Medicaid Enrollment Data: The Federal Dataset Behind 90 Million Beneficiaries and $900 Billion in Annual Spending

· AI Analytics
Federal DataCMSMedicaidHealthcare

Medicaid is the largest source of health coverage in the United States by enrollment and the largest health expenditure program by headcount. In 2023 it covered approximately 90 million beneficiaries — more than one in four Americans — through a federal–state partnership that spent roughly $900 billion in combined federal and state funds. Behind those numbers sits a data infrastructure spanning monthly enrollment files, a comprehensive claims warehouse, quarterly expenditure reports, and separate managed care encounter submissions. Understanding what those datasets contain, where to find them, and what they can reveal is the starting point for any serious analysis of American healthcare finance.

Medicaid and CHIP: the federal–state structure

Medicaid was established by Title XIX of the Social Security Act in 1965. It is not a single program but a framework: the federal government sets minimum eligibility and benefit standards, matches state spending at a formula rate, and oversees compliance; each state designs and administers its own program within those federal floors. The result is fifty-plus distinct Medicaid programs, each with its own eligibility rules, benefit packages, provider networks, and delivery systems — which is why Medicaid data is inherently state-stratified.

The Children's Health Insurance Program (CHIP), established by the Balanced Budget Act of 1997 under Title XXI, covers uninsured children in families with incomes above Medicaid eligibility limits. CHIP operates as either a standalone state program, a Medicaid expansion, or a combination of both. Enrollment and expenditure data for CHIP are reported separately from Medicaid but through parallel reporting systems. Together, Medicaid and CHIP account for the overwhelming majority of public health coverage for children and for low-income non-elderly adults.

Medicaid covers a heterogeneous population across five broad eligibility groups: children (historically the largest group); adults added by the ACA expansion at or below 138% of the federal poverty level; aged individuals (65 and older who also qualify on income and asset grounds); blind and disabled individuals receiving Supplemental Security Income; and the “dual eligible” population that qualifies for both Medicaid and Medicare. Each eligibility group carries different per-capita costs, different clinical needs, and different federal matching rates, making eligibility-group disaggregation essential for any expenditure or utilization analysis.

The key data sources

CMS and HHS publish Medicaid data through several distinct systems, each designed for a different analytic purpose.

Medicaid.gov Enrollment Data. Monthly enrollment counts by state and eligibility group, published at data.medicaid.gov through a Socrata-based API. The series goes back to 2014, covering the first year of the ACA Medicaid expansion. This is the primary source for tracking total enrollment trends, state-by-state comparisons, and eligibility-group shifts over time. Data typically lags by two to three months.

T-MSIS (Transformed Medicaid Statistical Information System). The comprehensive claims and enrollment data system that replaced the old Medicaid Statistical Information System (MSIS) beginning around 2014. T-MSIS captures all Medicaid encounters — fee-for-service claims and managed care encounter data — including diagnosis codes, procedure codes, provider information, and prescription drugs. CMS publishes annual T-MSIS Analytic Files (TAF) for research use; broader access requires state-specific data use agreements.

MBES/CBES (Medicaid Budget and Expenditure System / CHIP Budget and Expenditure System). Quarterly expenditure data reported by states, broken down by service category: acute care, long-term services and supports (LTSS), drug spending, Disproportionate Share Hospital (DSH) payments, and administrative costs. MBES data is the authoritative source for federal and state Medicaid spending and the basis for federal matching payment calculations. Published as Excel downloads at Medicaid.gov.

Managed care enrollment reports. CMS requires states to report managed care enrollment separately, covering the number of beneficiaries enrolled in Medicaid managed care organizations (MCOs) by state and plan. These reports feed into the broader enrollment data but are also published as standalone files. As managed care has become the dominant Medicaid delivery model, these reports have grown in policy importance.

Medicaid Drug Utilization database. State-by-state, NDC-level drug reimbursement data covering every drug paid for under Medicaid fee-for-service. This is one of the most granular federal drug spending datasets available, covering unit reimbursements at the drug-state-quarter level. Published at data.medicaid.gov.

Enrollment data structure

The monthly Medicaid enrollment files at data.medicaid.gov are organized primarily by state and eligibility group. Each row in the underlying dataset represents a state–month–eligibility-group combination, with a count of individuals enrolled in that group at that point in time.

The eligibility group taxonomy distinguishes between children, pregnant women, adults under the ACA expansion, aged individuals, blind and disabled individuals, and the “other adult” categories that predate the ACA. Within each group, enrollment can be further broken down by type of coverage: fee-for-service (FFS) versus managed care. The managed care breakdown is critical because managed care beneficiaries generate encounter data through MCOs rather than direct FFS claims, which affects how their utilization appears in claims databases.

The ACA Medicaid expansion — effective in participating states starting January 2014 — added a new adult eligibility group covering individuals at or below 138% of the federal poverty level who did not previously qualify for Medicaid. The 37 states plus the District of Columbia that chose to expand versus the 13 holdout states that did not creates a quasi-natural experiment that has generated a substantial body of research on coverage effects, hospital finances, health outcomes, and state fiscal impacts. The expansion states have meaningfully higher enrollment per capita and different eligibility-group compositions than non-expansion states, which must be controlled for in any national trend analysis.

The COVID enrollment surge and the unwinding

The Families First Coronavirus Response Act (FFCRA), enacted in March 2020, included a provision that prohibited states from disenrolling Medicaid beneficiaries for the duration of the federal public health emergency (PHE), provided the state accepted an enhanced federal matching rate of 6.2 percentage points above its standard FMAP. This “continuous enrollment” provision — effectively a moratorium on eligibility redeterminations — caused Medicaid enrollment to grow continuously from approximately 70 million beneficiaries pre-COVID to a peak of approximately 95 million by early 2023.

The COVID enrollment surge was not driven primarily by genuine new eligibility: many beneficiaries who would otherwise have lost coverage due to income changes, residency changes, or failure to return renewal paperwork remained enrolled because states could not disenroll them. When the PHE formally ended in April 2023, states were required to begin “unwinding” — resuming normal eligibility redeterminations for the first time in three years. The Consolidated Appropriations Act of 2023 had already begun phasing down the enhanced FMAP and established a timeline for the unwinding process.

The unwinding produced the largest single episode of Medicaid coverage loss since the program's founding. From the peak of approximately 95 million enrollees, national Medicaid enrollment fell to approximately 88 million by the end of 2024 as states worked through their backlogs of eligibility redeterminations. The magnitude and pace of disenrollment varied substantially by state, with some states conducting redeterminations rapidly and others extending the process over more than a year. CMS published detailed monthly unwinding reports tracking disenrollment by state and reason for loss of coverage, distinguishing between procedural disenrollments (renewal paperwork not returned) and substantive disenrollments (ineligible on redetermination). The finding that procedural disenrollments substantially outnumbered substantive disenrollments in many states — meaning coverage was lost for administrative rather than eligibility reasons — generated significant policy scrutiny.

Federal Medical Assistance Percentage (FMAP)

The federal government's share of Medicaid spending is determined by the Federal Medical Assistance Percentage, calculated annually for each state using a formula that inverts per capita income relative to the national average. Wealthier states receive a lower federal match; poorer states receive a higher match. The standard FMAP ranges from a floor of 50% for the highest-income states — California, Connecticut, New York, and Massachusetts typically receive 50% — to a ceiling of approximately 77% for Mississippi. The FMAP is published annually in the Federal Register and forms the basis for all federal matching payment calculations in MBES.

Enhanced FMAP rates apply to specific populations and programs. The ACA Medicaid expansion adult group received a 100% federal match in the first three years (2014 –2016), stepping down to 90% thereafter, making the expansion financially attractive for states even at the standard FMAP. CHIP receives a higher enhanced FMAP than standard Medicaid — approximately 15 percentage points above the standard rate — making CHIP relatively cheap for states to administer. The Basic Health Program (BHP), operated by a handful of states, receives a 95% federal match on the subsidies that would otherwise have gone to Marketplace coverage.

Congress has periodically enacted temporary FMAP enhancements as fiscal stimulus. The American Recovery and Reinvestment Act of 2009 added 6.2 percentage points to each state's standard FMAP for the duration of the recession. The FFCRA 2020 added the same 6.2-point enhancement during the COVID PHE. The American Rescue Plan Act of 2021 further enhanced FMAP for home and community-based services (HCBS) and subsequently expired on schedule. Each of these temporary enhancements appears as a distinct kink in the MBES expenditure series, and any longitudinal analysis must account for the enhanced-FMAP periods when comparing federal versus state spending shares over time.

T-MSIS: the comprehensive claims warehouse

The Transformed Medicaid Statistical Information System is the most granular federal Medicaid dataset available. T-MSIS receives data submissions from all 50 states and DC covering every Medicaid encounter — fee-for-service claims from providers billing directly to the state Medicaid program, and managed care encounter data submitted by MCOs to states and then forwarded to CMS.

The T-MSIS data model is organized into several file types: inpatient claims, long-term care claims, other services claims (outpatient, physician, pharmacy), managed care plan files, and eligibility files. Each claim record includes ICD-10 diagnosis codes, HCPCS or CPT procedure codes, National Drug Codes (NDC) for pharmacy claims, provider NPI, service dates, and allowed and paid amounts.

CMS prepares the T-MSIS Analytic Files (TAF) as cleaned, research-ready annual extracts. The TAF are available through the CMS Virtual Research Data Center (VRDC) and, for approved projects, through state data sharing agreements. The TAF have become the primary source for academic and policy research on Medicaid utilization, outcomes, and spending at the individual beneficiary level. Topics studied include prescription opioid dispensing patterns, dental care access, emergency department utilization, and quality of care for chronic conditions.

A significant limitation of T-MSIS is data quality heterogeneity. Not all states submit complete and timely data, and managed care encounter data completeness varies substantially by state and plan. CMS publishes annual T-MSIS data quality reports that flag states with known completeness or accuracy issues. Researchers using T-MSIS for cross-state comparisons must account for these data quality differences, which can confound results if not addressed.

Managed care enrollment and the shift from fee-for-service

Approximately 70% of Medicaid beneficiaries — and an even higher share in expansion states — are enrolled in managed care organizations (MCOs) rather than traditional fee-for-service Medicaid. Under managed care, the state pays the MCO a fixed per-member-per-month capitation rate in exchange for the MCO providing comprehensive Medicaid benefits to its enrolled members. The MCO bears the financial risk of member utilization and manages the provider network, care management programs, and quality improvement activities.

CMS requires managed care plans to submit encounter data to states and, through states, to T-MSIS. Actuarial soundness requirements mandate that capitation rates be set at a level sufficient to cover expected medical costs for the enrolled population, developed using sound actuarial principles by a qualified actuary. Rate development is one of the most contested aspects of Medicaid managed care policy; states have an incentive to set rates as low as possible to control spending, while plans have an incentive to push rates higher.

Quality measurement for Medicaid managed care flows through two primary frameworks: HEDIS (Healthcare Effectiveness Data and Information Set), the industry-standard quality measure set developed by NCQA; and the Medicaid Adult and Child Core Sets, which are standardized measure sets required by CMS for state reporting. States must publicly report their managed care plan performance on the Core Set measures annually. CMS publishes the aggregated state-level Core Set results, allowing national comparisons of Medicaid quality across states and delivery systems.

The shift toward managed care has been nearly universal for the acute care population — children, adults, and pregnant women. The LTSS population — aged and disabled individuals needing long-term services — has been slower to move into managed care due to the complexity of coordinating home and community-based services, but managed LTSS (MLTSS) programs have expanded substantially in the past decade.

Expenditure data: MBES and the cost breakdown

The Medicaid Budget and Expenditure System tracks quarterly spending by state, broken into detailed service categories. The major expenditure categories in MBES are:

  • Acute care services. Inpatient hospital, outpatient hospital, physician and practitioner services, laboratory and radiology, and prescription drugs under FFS. This is the largest spending category for the non-elderly non-disabled population.
  • Long-term services and supports (LTSS). Nursing facility services and home and community-based services (HCBS) waiver programs. LTSS is the highest-cost category per beneficiary for the aged and disabled population. Medicaid pays for approximately 42% of all long-term care expenditure in the United States, making it the dominant payer for nursing home and home health services. The average annual Medicaid expenditure for a nursing facility resident is several times higher than for a non-disabled adult.
  • Disproportionate Share Hospital (DSH) payments. A supplemental payment stream to hospitals that serve a disproportionate share of Medicaid and uninsured patients. DSH payments are capped at the federal and hospital levels and have been subject to periodic reductions under ACA and subsequent legislation.
  • Administrative costs. State agency staffing, eligibility systems, managed care contract administration, and program integrity activities. Federal matching for administrative costs is generally at 50% regardless of the standard FMAP.
  • Capitation payments to managed care. As states shift to managed care, an increasing fraction of MBES spending is reported as capitation payments rather than FFS service-category spending. This shift complicates longitudinal comparisons of expenditure by service type.

Dual eligibles: the highest-cost population

Approximately 12 million people are dually eligible for both Medicare and Medicaid — “dual eligibles” in federal nomenclature. To qualify for both programs simultaneously, an individual must meet Medicare eligibility (typically age 65 or older, or disabled) and also meet Medicaid income and asset requirements.

Dual eligibles are by far the highest-cost beneficiary population in either program. Average annual spending for a full dual eligible exceeds $35,000, compared with approximately $8,000 for a non-dual Medicaid adult. The elevated cost reflects the complex medical and functional needs of this population: the aged dual population typically has multiple chronic conditions, significant functional limitations, and high rates of nursing facility use; the disabled dual population has significant behavioral health and long-term services needs.

The coordination of Medicare and Medicaid benefits for dual eligibles is one of the most complex administrative challenges in American health policy. Medicare is the primary payer for most services; Medicaid acts as the secondary payer, covering Medicare cost-sharing (premiums, deductibles, and copayments) and services that Medicare does not cover at all, including long-term care and most dental and vision services. For full duals, Medicaid pays 100% of Medicare Part B premiums and all Medicare cost-sharing, effectively making the dual eligible's combined Medicare and Medicaid coverage nearly free at the point of service.

Federal and state programs have attempted to better integrate care for dual eligibles through several mechanisms: the Financial Alignment Initiative demonstrations that ran in multiple states from 2013 forward, testing capitated integrated care models (the FIDE-SNP model and the Medicare-Medicaid Plan model); the Programs of All-Inclusive Care for the Elderly (PACE), which provides integrated medical and LTSS through a capitated per-member rate covering both Medicare and Medicaid benefits; and Dual Eligible Special Needs Plans (D-SNPs), MA plans that specifically target dual eligibles and coordinate with state Medicaid programs.

Medicaid and long-term care: the de facto inheritance mechanism

Medicaid's role as the primary payer of long-term care has profound implications for middle-class Americans who did not plan for nursing home costs. Federal law requires that Medicaid applicants meet asset limits (typically $2,000 for an individual in most states) in addition to income limits. An individual who accumulates significant assets over a working lifetime — a house, retirement savings, a car — and then requires nursing home care at $8,000–$12,000 per month must spend down those assets to Medicaid eligibility levels before the program will pay for care.

The “Medicaid spend-down” mechanism functions as a de facto asset test on inheritance: assets that might otherwise pass to children or other heirs are consumed by nursing home costs until Medicaid eligibility is reached. Federal Medicaid estate recovery rules further require that states seek recovery of Medicaid costs from the estates of deceased beneficiaries who received Medicaid after age 55. The combination of spend-down and estate recovery creates a system in which long-term care costs effectively liquidate the life savings of the nursing-home-eligible population before federal coverage begins.

Section 1115 waivers allow states to pilot alternative long-term care delivery models under Medicaid authority. The most common application is managed LTSS — enrolling aged and disabled Medicaid beneficiaries in MCOs that are responsible for coordinating both medical and long-term services. HCBS waiver programs, authorized under Section 1915(c), allow states to provide home and community-based alternatives to institutional care to individuals who would otherwise require nursing facility placement. Both types of waiver generate enrollment and expenditure data that flows into T-MSIS and MBES.

Access and APIs

The primary public access point for Medicaid data is the Medicaid.gov data hub at data.medicaid.gov, which runs on a Socrata platform and exposes a standard Socrata query API. The enrollment dataset, expenditure data, and drug utilization data are all accessible through this API, supporting both tabular downloads and programmatic access via the OData and Socrata Query Language (SoQL) interfaces.

The Medicaid Drug Utilization database at data.medicaid.gov provides NDC-level drug reimbursement data by state and quarter, making it possible to track Medicaid spending on specific drugs across states and over time. This dataset has been used extensively to study Medicaid formulary restrictions on opioids, access to hepatitis C treatments, and the uptake of biosimilars under Medicaid.

T-MSIS data for research is accessed through the CMS Virtual Research Data Center (VRDC). Researchers submit a data use agreement and a research plan; CMS reviews and approves qualified projects. The process is substantially more cumbersome than accessing public use files, but the VRDC provides access to individual-level claims data that cannot be published in identifiable form. MACPAC (the Medicaid and CHIP Payment and Access Commission) also conducts research using T-MSIS under its own data access arrangements and publishes analyses based on this data.

MBES expenditure data is published as Excel downloads from Medicaid.gov, typically on a quarterly lag. There is no API for MBES; researchers must download the Excel files and parse them. The file format has changed across years, requiring format-specific parsing for historical series construction.

Python: tracking state enrollment during the COVID unwinding

The following script downloads monthly Medicaid enrollment data from the Medicaid.gov Socrata API, computes the enrollment change from January 2023 — the last month before unwinding began — to the most recent available month, and ranks states by percentage enrollment decline. It flags each state as an expansion or non-expansion state and computes the national monthly trend. Install dependencies with pip install requests pandas.

import requests
import pandas as pd
from io import StringIO

# -------------------------------------------------------
# Medicaid Enrollment Unwinding Tracker
# Downloads monthly enrollment data from data.medicaid.gov
# (Socrata-based API), computes state-level changes from
# January 2023 (pre-unwinding baseline) to the most recent
# available month, and ranks states by enrollment decline.
# -------------------------------------------------------

# Socrata dataset ID for Medicaid/CHIP monthly enrollment
# "Medicaid and CHIP Monthly Enrollment" at data.medicaid.gov
DATASET_ID = "n5ce-jxme"
BASE_URL = f"https://data.medicaid.gov/api/1/datastore/query/{DATASET_ID}/0"

# Expansion states as of 2024 (37 states + DC = 38 jurisdictions)
EXPANSION_STATES = {
    "AK","AZ","AR","CA","CO","CT","DE","DC","HI","ID","IL","IN","IA",
    "KS","KY","LA","ME","MD","MA","MI","MN","MO","MT","NE","NV","NH",
    "NJ","NM","NY","ND","OH","OK","OR","PA","RI","SD","VA","VT","WA",
    "WI","WV",
}

def fetch_enrollment(conditions=None, limit=5000):
    """Fetch rows from the Medicaid enrollment Socrata dataset."""
    payload = {
        "limit": limit,
        "offset": 0,
        "conditions": conditions or [],
    }
    resp = requests.post(BASE_URL, json=payload, timeout=60)
    resp.raise_for_status()
    data = resp.json()
    return pd.DataFrame(data.get("results", []))


# -------------------------------------------------------
# Step 1: Pull all enrollment rows; filter to total enrollment
# rows (the dataset contains both eligibility-group breakdowns
# and state totals; we use the "Total" eligibility group row
# when present, or sum across groups otherwise).
# -------------------------------------------------------
print("Downloading Medicaid enrollment data from data.medicaid.gov ...")
df = fetch_enrollment(limit=10000)

# Normalize column names
df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]

# Core columns: state, month (YYYY-MM or similar), enrollment count
# Actual column names may vary by dataset vintage; adjust as needed
state_col = next((c for c in df.columns if "state" in c), None)
date_col = next((c for c in df.columns if "month" in c or "date" in c), None)
enroll_col = next((c for c in df.columns if "enrollment" in c or "total" in c), None)

if not all([state_col, date_col, enroll_col]):
    print("Available columns:", list(df.columns))
    raise ValueError("Could not identify state/date/enrollment columns; inspect output above.")

df = df[[state_col, date_col, enroll_col]].rename(
    columns={state_col: "state", date_col: "year_month", enroll_col: "enrollment"}
)

df["enrollment"] = pd.to_numeric(df["enrollment"], errors="coerce").fillna(0)
df["year_month"] = df["year_month"].astype(str).str[:7]  # normalize to YYYY-MM

# Keep most recent 24 months
all_months = sorted(df["year_month"].unique())
recent_months = all_months[-24:]
df = df[df["year_month"].isin(recent_months)]

# State-level totals per month (sum across eligibility groups if duplicates)
monthly = (
    df.groupby(["state", "year_month"])["enrollment"]
    .sum()
    .reset_index()
)

# -------------------------------------------------------
# Step 2: Compute national monthly total
# -------------------------------------------------------
national = (
    monthly.groupby("year_month")["enrollment"]
    .sum()
    .reset_index()
    .rename(columns={"enrollment": "national_enrollment"})
    .sort_values("year_month")
)
print("\nNational monthly enrollment trend (last 24 months):")
print(national.to_string(index=False))

# -------------------------------------------------------
# Step 3: Baseline = Jan 2023 (pre-unwinding peak)
# Most recent = last available month
# -------------------------------------------------------
BASELINE_MONTH = "2023-01"
latest_month = sorted(monthly["year_month"].unique())[-1]

baseline = monthly[monthly["year_month"] == BASELINE_MONTH].set_index("state")["enrollment"]
latest = monthly[monthly["year_month"] == latest_month].set_index("state")["enrollment"]

# States present in both periods
common_states = baseline.index.intersection(latest.index)
change = pd.DataFrame({
    "state": common_states,
    "enrollment_jan_2023": baseline[common_states].values,
    f"enrollment_{latest_month}": latest[common_states].values,
})
change["change"] = change[f"enrollment_{latest_month}"] - change["enrollment_jan_2023"]
change["pct_change"] = (change["change"] / change["enrollment_jan_2023"] * 100).round(1)
change["expansion"] = change["state"].isin(EXPANSION_STATES).map({True: "Yes", False: "No"})
change = change.sort_values("pct_change")

print(f"\nState enrollment change: {BASELINE_MONTH} (pre-unwinding) to {latest_month}")
print(change[["state", "expansion", "enrollment_jan_2023",
              f"enrollment_{latest_month}", "change", "pct_change"]].to_string(index=False))

# -------------------------------------------------------
# Step 4: Summary statistics by expansion status
# -------------------------------------------------------
print("\nMedian enrollment change by expansion status:")
print(
    change.groupby("expansion")["pct_change"]
    .median()
    .rename("median_pct_change")
    .to_string()
)

print(f"\nNational peak-to-latest: {national['national_enrollment'].max():,.0f} peak, "
      f"{national[national['year_month'] == latest_month]['national_enrollment'].values[0]:,.0f} latest")

Several implementation notes for production use. The Socrata dataset ID and exact column names at data.medicaid.gov may vary across dataset vintages; the script includes defensive column detection that prints available columns if the expected names are not found. The enrollment figures in the monthly files typically represent point-in-time counts at the end of each month, not cumulative beneficiaries served; a beneficiary enrolled in all 12 months of a year is counted once per month, not once per year. When computing national totals, verify that the dataset does not include a pre-aggregated national row — summing state rows plus a national summary row will double-count. The expansion state set reflects states that had expanded as of 2024; three states (North Carolina, South Dakota, Oklahoma) expanded between 2021 and 2023 and should be classified carefully if analyzing pre-expansion periods.

The percentage decline metric is the standard measure used by CMS and researchers to track unwinding. The distinction between expansion and non-expansion states matters because expansion states had larger ACA adult enrollment that was more likely to include individuals who gained coverage during the PHE period and lost it during unwinding; non-expansion states' unwinding was concentrated more heavily in the child and traditional adult eligibility groups.


CMS Medicare Advantage: Plan Bids, Star Ratings, and the Federal Dataset Behind Private Medicare — Medicare Advantage now covers more than half of all Medicare beneficiaries. This guide explains the landscape files, benchmark and bidding system, star ratings, risk adjustment, and the market concentration data behind 33 million MA enrollees.

CMS Hospital Quality Data: Outcomes, Readmissions, and Star Ratings for 6,000 US Hospitals — the CMS Hospital Compare datasets covering readmission rates, complication rates, patient experience scores, and overall star ratings for every Medicare-certified hospital in the country.

HHS OIG Exclusions: The Federal Healthcare Fraud Blacklist That Every Provider Must Screen Against — the LEIE covers 76,000+ excluded providers; employers face $10,000 per-service civil monetary penalties for billing with excluded staff regardless of knowledge. How the database works and how to automate monthly screening.