CMS Hospital Cost Reports: The Federal Database Behind Hospital Financial Data for 6,000 US Facilities

The CMS Hospital Cost Report (HCRIS) database contains detailed financial and utilization data for every Medicare-participating hospital—revenues, costs, charges, staffing, beds, and patient days—submitted annually and publicly available, making it the most comprehensive source of US hospital financial data.

What HCRIS is

The Healthcare Cost Report Information System (HCRIS) is the federal repository for Medicare cost reports filed by every provider participating in Medicare and Medicaid. The obligation spans a wide range of facility types: acute care hospitals, skilled nursing facilities, home health agencies, hospices, renal dialysis facilities, and other providers must each submit an annual cost report to their Medicare Administrative Contractor (MAC)—the regional private insurer that CMS contracts to process Medicare claims and audit provider cost reports.

For acute care hospitals, the governing form is CMS-2552, commonly called the Hospital Cost Report. It must be filed within five months of the hospital's fiscal year end. Approximately 6,000 hospitals file annually. After submission, the MAC reviews the report, performs field audits on a risk-stratified sample, and issues a Notice of Program Reimbursement settling the cost report—a process that typically takes one to three years. CMS publishes both “as-submitted” and “settled” versions of the data; settled reports reflect MAC-audited figures and are the appropriate basis for financial analysis.

Because cost reports are filed under penalty of false statement and audited by MACs, HCRIS is qualitatively different from voluntary hospital surveys. It is the only federal dataset that contains audited, hospital-level financial data spanning all facility types regardless of ownership—nonprofit, for-profit, and government. The primary practical limitation is the data lag: the most recently settled data is typically two to three years behind the current calendar year, reflecting the time required for MAC settlement.

What the cost report contains

Form CMS-2552 consists of approximately 100 worksheets organized by function. The worksheets most relevant to financial analysis are:

Worksheet S — Statistical data

Worksheet S captures hospital characteristics and utilization statistics. Worksheet S-2 reports the hospital's CMS Certification Number (CCN), ownership type, teaching status, bed counts by unit type, and Medicare provider agreement dates. Worksheet S-3 reports bed utilization in detail: licensed beds, available beds, and staffed beds by cost center, along with total patient days (inpatient and outpatient), total discharges, and occupancy rates. Worksheet S-3 also disaggregates patient days and discharges by payer—Medicare, Medicaid, commercial insurance, self-pay—enabling payer mix analysis at the hospital level. Full-time equivalent employee counts by job category (registered nurses, licensed practical nurses, aides, administrative staff, physicians) appear in Worksheet S-3 Part II.

Worksheet S-10, added in 2015 and fully phased into the DSH payment formula by 2020, reports uncompensated care: charity care charges, charity care costs (charges deflated by the cost-to-charge ratio), bad debt expense, and Medicaid shortfall. S-10 data is now audited by MACs and used by CMS to allocate approximately $8 billion annually in DSH uncompensated care pool payments. It is also required for nonprofit hospitals to document their 501(r) community benefit obligations.

Worksheet A — Reclassified trial balance

Worksheet A is a trial balance of all hospital operating expenses organized by cost center. Cost centers fall into two categories. Overhead cost centers include administration and general, plant operations, housekeeping, dietary, laundry and linen, employee benefits, and medical records—departments that support patient care but do not directly generate patient revenue. Direct patient care cost centers include all clinical service lines: routine nursing units, intensive care units, step-down units, emergency department, and ancillary services such as radiology, laboratory, pharmacy, physical therapy, operating room, and specialty surgical suites. Each cost center captures total salaries, total other operating costs, and total costs, providing a granular decomposition of hospital operating structure that no other public dataset replicates.

Worksheets B and B-1 — Cost allocation

Worksheets B and B-1 perform the stepdown cost allocation that distributes overhead cost center expenses to direct patient care cost centers. The stepdown methodology allocates each overhead cost center to all remaining cost centers in a defined sequence using relative value units as allocation bases: housekeeping costs are distributed by square footage, dietary costs by meal equivalents, laundry by pounds of linen processed, employee benefits by gross salaries, and administrative and general costs by accumulated cost. After the stepdown, each direct patient care cost center carries its own direct costs plus its proportionate share of all allocated overhead. This fully loaded cost center cost is the foundation for all subsequent Medicare cost determination.

Worksheet C — Cost centers and charges

Worksheet C computes, for each ancillary cost center, the ratio of Medicare charges to total charges—the cost center's Medicare utilization fraction. Multiplying fully loaded cost center costs by this ratio produces Medicare-allowable costs for each service line. Worksheet C also applies the lower-of-cost-or-charges test: Medicare pays the lower of Medicare-allowable cost or what the hospital charges Medicare patients. In practice, actual costs are almost universally below chargemaster prices, so cost governs.

Worksheets D and D-1 — Medicare inpatient cost and reimbursement

Worksheet D computes total Medicare inpatient costs from the cost center data on Worksheet C. Worksheet D-1 performs the Medicare inpatient reimbursement settlement: it aggregates DRG base payments, capital passthrough payments, outlier payments, the disproportionate share hospital (DSH) adjustment, and the indirect medical education (IME) adjustment, then compares total Medicare reimbursement against interim payments already made during the year to determine the net settlement amount.

Worksheets E and E-1 — Medicare outpatient reimbursement

Worksheets E and E-1 cover Medicare outpatient reimbursement. With the shift from cost-based to prospective payment for hospital outpatient services under the Outpatient Prospective Payment System (OPPS), these worksheets primarily settle capital passthrough payments and certain cost-based outpatient services—most outpatient payment is now prospective and flows through claims rather than cost report settlement.

Worksheet G — Balance sheet and income statement

Worksheet G provides the hospital's balance sheet and income statement as reported to Medicare. It captures total assets, total liabilities, net equity, total revenues (patient service revenue, other operating revenue, non-operating revenue), and total operating expenses. Worksheet G is the source for EBITDA margin computation: operating income—net patient service revenue plus other operating revenue minus operating expenses—divided by total revenues. Unlike voluntary financial disclosures, Worksheet G figures are subject to MAC audit, providing a more reliable basis for cross-hospital financial comparison than IRS Form 990 Schedule H data, which is self-reported and unaudited.

Key metrics derivable from cost reports

HCRIS supports a wide range of financial and operational metrics not available from any other single federal source:

Cost-to-charge ratio

The cost-to-charge ratio (CCR) is total costs divided by total charges. For most hospitals it falls between 0.10 and 0.50— meaning the hospital's actual costs are 10 to 50 cents per dollar billed. CMS uses the CCR internally to convert charges to cost equivalents for outlier payment qualification: when a patient's charges exceed the outlier fixed loss threshold, CMS multiplies those charges by the hospital's CCR to estimate actual cost before applying the 80% outlier payment formula. Researchers use the CCR to deflate charge data from all-payer discharge databases and state inpatient records into cost-equivalent values for cost accounting analysis.

Occupancy rate

Occupancy rate is computed as total inpatient patient days divided by licensed beds multiplied by 365. A hospital with 200 licensed beds and 56,000 patient days in a year has an occupancy rate of 76.7%. Occupancy rate is a key indicator of capacity utilization and financial efficiency: hospitals operating at very low occupancy carry high fixed costs per admission, while hospitals near full capacity face throughput constraints. HCRIS allows occupancy computation at the cost center level—ICU occupancy and routine nursing unit occupancy separately—in addition to the facility-wide rate.

Cost per discharge and salary per FTE

Total inpatient costs divided by total discharges yields the average cost per discharge—an efficiency metric that, when compared across hospitals of similar case mix and teaching status, reveals structural differences in resource use. Total labor costs (salaries plus benefits) divided by total FTE count yields salary per FTE, a labor productivity benchmark that varies substantially by market, ownership type, and union status. Both metrics require cost report data; neither is available from claims data alone.

Payer mix

Worksheet S-3 provides patient day counts by payer category: Medicare, Medicaid, commercial insurance, and self-pay. Dividing each payer's days by total patient days yields the hospital's payer mix. Payer mix is the single most important structural determinant of hospital revenue per admission: Medicare pays fixed DRG rates, Medicaid typically pays below cost in most states, commercial insurers pay the highest rates, and self-pay patients produce the most uncollectible revenue. Safety-net hospitals with high Medicare and Medicaid payer mixes face a structurally different revenue challenge than suburban hospitals with commercial- majority payer mixes.

Charity care and uncompensated care

Worksheet S-10 provides charity care costs, bad debt expense, and Medicaid shortfall at the hospital level. These figures underpin the DSH uncompensated care payment formula and also represent the primary audited data source for nonprofit hospital community benefit analysis. Dividing charity care costs by net patient revenue yields the charity care intensity ratio—the fraction of a hospital's revenue base devoted to uncompensated service. Cross-referencing this ratio against operating margin reveals whether hospitals providing high charity care are doing so from a position of financial strength or at the cost of financial distress.

EBITDA margin

Operating income divided by total revenues from Worksheet G yields the EBITDA margin. Across the US hospital industry this figure has ranged from approximately −1% to +6% in most years, with wide dispersion: large academic medical centers in concentrated markets post margins of 10% or more, while rural safety-net hospitals frequently operate at negative margins. For-profit hospital chains show high average margins with low dispersion; the nonprofit sector shows high variance driven by the divergence between well-endowed academic centers and financially stressed community hospitals.

Medicare payment implications

Cost report data directly drives four major Medicare payment adjustments that can each represent tens of millions of dollars annually for a large hospital.

Disproportionate Share Hospital payments

The DSH adjustment compensates hospitals that serve disproportionate shares of low-income Medicare and Medicaid patients. The DSH calculation is based on two patient-share fractions derived from cost report data: the Medicaid fraction (Medicaid patient days as a share of total patient days) and the Medicare SSI fraction (Medicare patient days for patients who also receive Supplemental Security Income benefits, as a share of total Medicare patient days). A hospital whose combined fractions exceed the statutory threshold receives an upward DRG payment adjustment ranging from approximately 5% to 35% of base Medicare payments, depending on the fractions' magnitude.

The Affordable Care Act restructured DSH payments in 2014: 25% of the pre-ACA empirically justified DSH amount continues as a permanent baseline payment, while 75% flows through an uncompensated care pool distributed based on audited Worksheet S-10 data. For a large urban safety-net hospital, the combined DSH and uncompensated care payments may exceed $50–100 million annually—representing a meaningful fraction of total Medicare revenue.

Graduate Medical Education payments

Teaching hospitals receive two separate Medicare payments for training residents, both derived from cost report data. Direct Graduate Medical Education (DGME) payments cover Medicare's share of the direct costs of residency training: resident salaries, benefits, and faculty supervision costs, computed from resident FTE counts reported in the cost report. Indirect Medical Education (IME) payments are a percentage add-on to DRG base payments calculated from the hospital's intern-and-resident-to-bed ratio, also from the cost report. IME adds approximately 5.5% to DRG payments for each 0.1 increment in the ratio; a major academic medical center may receive an IME add-on of 25–40% above base DRG payments. Total GME payments across all teaching hospitals exceed $12 billion annually.

Capital cost reimbursement

Under the IPPS capital prospective payment system, hospitals receive a capital payment rate per discharge that reflects national average capital costs, adjusted for local market factors. The cost report depreciation schedules and capital asset records provide the basis for hospitals to demonstrate hold-harmless exceptions and transition period adjustments to the standard capital rate. For hospitals that made large capital investments during the hold-harmless transition period in the early 1990s, capital passthrough payments based on actual cost report depreciation may exceed the prospective capital rate—and these settlements flow through the cost report.

Critical Access Hospital cost-based reimbursement

Critical Access Hospitals—small rural hospitals with 25 or fewer acute care beds located more than 35 miles from the nearest hospital—are exempt from IPPS entirely. CAHs receive 101% of allowable Medicare costs for both inpatient and outpatient services, determined entirely through the cost report settlement process. There is no DRG payment mechanism for CAHs: every dollar of Medicare revenue is cost-report-driven. This cost-plus structure means that CAH financial performance in HCRIS looks structurally different from IPPS hospitals—their cost-to-charge ratios are higher (typically 0.7–0.9 rather than 0.2–0.4), and their operating margins are constrained by the cost basis rather than by market pricing power.

Data access

HCRIS data is available through two primary paths:

The CMS Cost Reports download page at https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/Cost-Reports/ publishes annual ZIP files for each provider type and form version: Hospital (Form 2552-10 current, 2552-96 legacy), Skilled Nursing Facility (1728-94), Home Health Agency (1728-94), Hospice (1984-14), and Renal Facility (265-11). Each ZIP contains four relational tables: the RPT table (one row per cost report, with provider number, fiscal year dates, and settlement status), ALPHA tables (worksheet data with text variable names), NMRC tables (worksheet data in numeric format), and RPT_REC (report version tracking). Working with raw CMS files requires joining across these tables using the report number as the key. The NBER HCRIS database at https://data.nber.org/hcris/ maintains a pre-merged, consistently labeled version from 1996 onward that is the most accessible starting point for longitudinal analysis.

The data.cms.gov portal provides a pre-extracted summary dataset at https://data.cms.gov/provider-compliance/cost-report/hospital-provider-cost-report with key financial metrics already flattened into a single hospital-year table accessible via the Socrata API. Key fields in this dataset include: provider_ccn (the CMS Certification Number), fiscal_year_end_date, total_beds, total_patient_days, total_charges, total_costs, net_patient_revenue, total_salaries, total_fte, medicare_discharges, medicaid_discharges, and charity_care_costs. No API key is required for read access at standard query volumes. This Socrata endpoint is the most efficient path for the financial metrics most commonly used in policy research, though the raw CMS ZIP files are necessary for worksheet- level detail.

HCRIS has been the gold standard for hospital financial transparency in health policy research for three decades. The dataset underpins the major academic literatures on hospital pricing, consolidation, and financial distress: the RAND Corporation's hospital financial stress analyses, Chapin White's work on hospital price variation, the FTC's hospital merger retrospective research, and the Dartmouth Atlas geographic variation studies all rely on HCRIS as their primary financial data source.

Research applications

The academic and policy literature built on HCRIS covers several major threads. Hospital financial stress and closure risk: RAND Corporation analyses have used HCRIS operating margins, cash-on-hand days, and Medicare payer mix to identify hospitals at risk of closure before closure events occur. Rural hospital distress is particularly tractable in HCRIS—CAH and non-CAH rural hospitals show deteriorating margin trends in longitudinal analysis, and the dataset supports survival analysis of closure predictors.

Hospital consolidation: the FTC's retrospective studies of hospital mergers compare pre- and post-merger prices and costs using HCRIS as the cost benchmark. Because HCRIS covers all hospitals regardless of system affiliation, it enables clean comparison of acquired versus non-acquired hospitals across multiple financial dimensions. Price transparency rule enforcement: researchers comparing CMS Hospital Price Transparency machine-readable files against HCRIS CCR data can compute implied procedure-level margins, testing whether hospitals with high negotiated rates also show lower costs or whether high prices reflect pure market power.

Nonprofit community benefit: cross-referencing IRS Form 990 Schedule H charity care figures against HCRIS Worksheet S-10 charity care costs allows researchers to test consistency between tax-exempt reporting and Medicare cost report reporting—two separately audited datasets that in principle measure the same quantity. Systematic discrepancies reveal either methodological differences in charity care accounting or, in some cases, inconsistencies in self-reporting.

Python example: analyzing HCRIS via the Socrata API

The following script fetches the CMS Hospital Provider Cost Report from the data.cms.gov Socrata API and computes five analyses: median operating margin by hospital type (teaching versus non-teaching versus Critical Access), charity care as a percentage of net revenue by state, staffed beds occupancy rate by state, the top 20 hospitals by net patient revenue, and the distribution of Medicaid payer mix across the hospital universe. No API key is required.

import requests
import pandas as pd
import io

# ---------------------------------------------------------------------------
# CMS Hospital Provider Cost Report
# Source: data.cms.gov Socrata API
# Endpoint: https://data.cms.gov/provider-compliance/cost-report/hospital-provider-cost-report
# No API key required for read access.
# Key fields: provider_ccn, fiscal_year_end_date, total_beds, total_patient_days,
#   total_charges, total_costs, net_patient_revenue, total_salaries, total_fte,
#   medicare_discharges, medicaid_discharges, charity_care_costs,
#   number_of_beds, inpatient_days, teaching_indicator, critical_access_indicator
# ---------------------------------------------------------------------------

SOCRATA_BASE = "https://data.cms.gov/resource"
# Dataset ID for Hospital Provider Cost Report (verify latest at data.cms.gov)
DATASET_ID = "tg9h-gxi4"   # Hospital Provider Cost Report

def fetch_cost_reports(limit: int = 150000) -> pd.DataFrame:
    """
    Fetch the CMS Hospital Provider Cost Report from the Socrata API.
    Returns a DataFrame with one row per hospital per fiscal year.
    """
    url = f"{SOCRATA_BASE}/{DATASET_ID}.csv"
    params = {"$limit": limit, "$order": "fiscal_year_end_date DESC"}
    resp = requests.get(url, params=params, timeout=300)
    resp.raise_for_status()
    df = pd.read_csv(io.StringIO(resp.text), dtype=str, low_memory=False)
    print(f"Loaded {len(df):,} rows, {df[\'provider_ccn\'].nunique():,} unique hospitals")
    return df

def clean_numeric(df: pd.DataFrame, cols: list) -> pd.DataFrame:
    """Coerce listed columns to float, stripping commas and dollar signs."""
    for col in cols:
        if col in df.columns:
            df[col] = (
                df[col].astype(str)
                .str.replace(r"[\$,]", "", regex=True)
                .pipe(pd.to_numeric, errors="coerce")
            )
    return df

NUMERIC_COLS = [
    "total_beds", "total_patient_days", "total_charges", "total_costs",
    "net_patient_revenue", "total_salaries", "total_fte",
    "medicare_discharges", "medicaid_discharges", "charity_care_costs",
    "number_of_beds", "inpatient_days",
]

def most_recent_year(df: pd.DataFrame) -> pd.DataFrame:
    """Keep only the most recent cost report row per hospital."""
    df["fiscal_year_end_date"] = pd.to_datetime(
        df["fiscal_year_end_date"], errors="coerce"
    )
    df = df.sort_values("fiscal_year_end_date", ascending=False)
    return df.drop_duplicates(subset=["provider_ccn"], keep="first").copy()

# ---------------------------------------------------------------------------
# (a) Median operating margin by hospital type
# teaching_indicator: "Y" / "N"; critical_access_indicator: "Y" / "N"
# Operating margin = (net_patient_revenue - total_costs) / net_patient_revenue
# ---------------------------------------------------------------------------

def operating_margin_by_type(df: pd.DataFrame) -> None:
    df = df.copy()
    df["operating_margin"] = (
        (df["net_patient_revenue"] - df["total_costs"])
        / df["net_patient_revenue"].replace(0, float("nan"))
    )
    # Build a categorical hospital type label
    def label(row):
        cah = str(row.get("critical_access_indicator", "")).strip().upper()
        teach = str(row.get("teaching_indicator", "")).strip().upper()
        if cah == "Y":
            return "Critical Access"
        if teach == "Y":
            return "Teaching (IPPS)"
        return "Non-teaching (IPPS)"

    df["hospital_type"] = df.apply(label, axis=1)
    summary = (
        df.dropna(subset=["operating_margin"])
        .groupby("hospital_type")["operating_margin"]
        .agg(count="count", median="median", mean="mean", p25=lambda x: x.quantile(0.25), p75=lambda x: x.quantile(0.75))
        .sort_values("median", ascending=False)
    )
    print("\n=== (a) Median Operating Margin by Hospital Type ===")
    print(f"{\'Type\':<22} {\'N\':<8} {\'Median\':<10} {\'Mean\':<10} {\'P25\':<10} {\'P75\'}")
    print("-" * 65)
    for htype, row in summary.iterrows():
        print(
            f"{htype:<22} {row[\'count\']:<8.0f} "
            f"{row[\'median\']:>8.1%}  {row[\'mean\']:>8.1%}  "
            f"{row[\'p25\']:>8.1%}  {row[\'p75\']:>8.1%}"
        )

# ---------------------------------------------------------------------------
# (b) Charity care as % of total revenue by state
# ---------------------------------------------------------------------------

def charity_care_by_state(df: pd.DataFrame) -> None:
    df = df.copy()
    state_col = next(
        (c for c in df.columns if "state" in c.lower() and "provider" in c.lower()), None
    ) or next((c for c in df.columns if "state" in c.lower()), None)
    if state_col is None:
        print("State column not found; skipping charity care analysis.")
        return
    df["charity_pct"] = (
        df["charity_care_costs"] / df["net_patient_revenue"].replace(0, float("nan"))
    )
    state_summary = (
        df.dropna(subset=["charity_pct"])
        .groupby(state_col)["charity_pct"]
        .agg(hospitals="count", median_pct="median")
        .sort_values("median_pct", ascending=False)
        .head(15)
    )
    print("\n=== (b) Charity Care as % of Net Revenue (Top 15 States, Median) ===")
    print(f"{\'State\':<8} {\'Hospitals\':<12} {\'Median Charity %\'}")
    print("-" * 35)
    for state, row in state_summary.iterrows():
        print(f"{state:<8} {row[\'hospitals\']:>10.0f}  {row[\'median_pct\']:>12.2%}")

# ---------------------------------------------------------------------------
# (c) Staffed beds occupancy rate by state
# Occupancy = total_patient_days / (total_beds * 365)
# ---------------------------------------------------------------------------

def occupancy_by_state(df: pd.DataFrame) -> None:
    df = df.copy()
    beds_col = "total_beds" if "total_beds" in df.columns else "number_of_beds"
    days_col = "total_patient_days" if "total_patient_days" in df.columns else "inpatient_days"
    if beds_col not in df.columns or days_col not in df.columns:
        print("Beds or days column not found; skipping occupancy analysis.")
        return
    df["occupancy_rate"] = df[days_col] / (df[beds_col].replace(0, float("nan")) * 365)
    df = df[(df["occupancy_rate"] > 0) & (df["occupancy_rate"] <= 1.05)]
    state_col = next(
        (c for c in df.columns if "state" in c.lower() and "provider" in c.lower()), None
    ) or next((c for c in df.columns if "state" in c.lower()), None)
    if state_col is None:
        print("State column not found; skipping occupancy analysis.")
        return
    occ_state = (
        df.dropna(subset=["occupancy_rate"])
        .groupby(state_col)["occupancy_rate"]
        .agg(hospitals="count", median_occ="median")
        .sort_values("median_occ", ascending=False)
    )
    print("\n=== (c) Staffed Beds Occupancy Rate by State (Median) ===")
    print(f"{\'State\':<8} {\'Hospitals\':<12} {\'Median Occupancy\'}")
    print("-" * 35)
    for state, row in occ_state.iterrows():
        print(f"{state:<8} {row[\'hospitals\']:>10.0f}  {row[\'median_occ\']:>14.1%}")

# ---------------------------------------------------------------------------
# (d) Top 20 hospitals by total net patient revenue
# ---------------------------------------------------------------------------

def top_hospitals_by_revenue(df: pd.DataFrame) -> None:
    name_col = next(
        (c for c in df.columns if "hospital_name" in c.lower() or "facility_name" in c.lower()), None
    ) or next((c for c in df.columns if "name" in c.lower()), None)
    state_col = next(
        (c for c in df.columns if "state" in c.lower() and "provider" in c.lower()), None
    ) or next((c for c in df.columns if "state" in c.lower()), None)
    show_cols = [c for c in [name_col, state_col, "net_patient_revenue", "total_beds"] if c]
    top20 = (
        df.dropna(subset=["net_patient_revenue"])
        .nlargest(20, "net_patient_revenue")[show_cols]
    )
    print("\n=== (d) Top 20 Hospitals by Net Patient Revenue ===")
    for _, row in top20.iterrows():
        name = str(row.get(name_col, ""))[:40] if name_col else ""
        state = str(row.get(state_col, "")) if state_col else ""
        rev = row["net_patient_revenue"]
        rev_b = rev / 1e9
        beds = row.get("total_beds", float("nan"))
        beds_str = f"beds={beds:.0f}" if not pd.isna(beds) else ""
        print(f"  {name:<40} {state:<4}  revenue=${rev_b:>6.2f}B  {beds_str}")

# ---------------------------------------------------------------------------
# (e) Medicaid payer mix distribution
# Medicaid payer mix = medicaid_discharges / (medicare_discharges + medicaid_discharges)
# Used as proxy for low-income population served
# ---------------------------------------------------------------------------

def medicaid_payer_mix(df: pd.DataFrame) -> None:
    df = df.copy()
    total_disch = df["medicare_discharges"].fillna(0) + df["medicaid_discharges"].fillna(0)
    df["medicaid_share"] = df["medicaid_discharges"] / total_disch.replace(0, float("nan"))
    df = df.dropna(subset=["medicaid_share"])
    df = df[(df["medicaid_share"] >= 0) & (df["medicaid_share"] <= 1)]
    bins = [0, 0.10, 0.20, 0.30, 0.40, 0.50, 1.01]
    labels = ["0-10%", "10-20%", "20-30%", "30-40%", "40-50%", ">50%"]
    df["medicaid_bin"] = pd.cut(df["medicaid_share"], bins=bins, labels=labels, right=False)
    dist = df["medicaid_bin"].value_counts().sort_index()
    total = dist.sum()
    print("\n=== (e) Medicaid Payer Mix Distribution (% of Government Discharges) ===")
    print(f"{\'Medicaid Share\':<14} {\'Hospitals\':<12} {\'% of Total\'}")
    print("-" * 38)
    for bucket, count in dist.items():
        print(f"{str(bucket):<14} {count:>10,}  {count / total:>10.1%}")
    print(f"\nMedian Medicaid share: {df[\'medicaid_share\'].median():.1%}")
    print(f"75th percentile:       {df[\'medicaid_share\'].quantile(0.75):.1%}")

def main() -> None:
    print("Fetching CMS Hospital Provider Cost Report from data.cms.gov...")
    raw = fetch_cost_reports()
    raw = clean_numeric(raw, NUMERIC_COLS)
    df = most_recent_year(raw)
    print(f"Most-recent-year snapshot: {len(df):,} hospitals")

    operating_margin_by_type(df)
    charity_care_by_state(df)
    occupancy_by_state(df)
    top_hospitals_by_revenue(df)
    medicaid_payer_mix(df)

if __name__ == "__main__":
    main()

The operating margin analysis will typically show Critical Access Hospitals at near-zero or slightly positive margins—an artifact of their cost-plus reimbursement structure—while teaching hospitals in major markets post the widest range, from high-margin academic medical centers to financially stressed urban safety-net systems. The charity care distribution by state reflects Medicaid expansion status as much as hospital generosity: states that expanded Medicaid under the ACA converted many charity care cases into Medicaid-reimbursed cases, mechanically reducing measured charity care as a fraction of revenue without necessarily reducing the underlying volume of low-income care. The Medicaid payer mix distribution illustrates the structural divide in the US hospital system: a large share of hospitals are concentrated below 20% Medicaid, while a long tail of safety-net facilities exceeds 40%, with commensurately different revenue risk profiles.

Part of the Federal Regulatory Data Hub.