Technical writing

CMS Hospital Compare: The Federal Database Behind Quality Ratings for 5,000 US Hospitals

January 2, 2027· 17 min read· AI Analytics

CMSHospital CompareHealthcare QualityMedicareFederal Data

The CMS Hospital Compare program publishes readmission rates, patient safety indicators, HCAHPS patient satisfaction scores, and payment data for every Medicare-certified hospital in the United States — the largest public hospital quality disclosure program in the world.

This article covers the scope and institutional history of Hospital Compare, the Overall Hospital Quality Star Rating methodology and its five component domains, the HCAHPS patient satisfaction survey and its 10 measurement domains, the Hospital Readmissions Reduction Program and its financial penalties, 30-day mortality and complication measures including PSI-90, the HAC Reduction Program, Medicare spending and value-based purchasing payment adjustments, data access through data.cms.gov, and a Python script that downloads and analyzes the datasets.

What Hospital Compare Is

Hospital Compare originated as a CMS public reporting initiative launched in 2002 in partnership with the Hospital Quality Alliance, a public-private coalition that included hospital associations, physician organizations, and consumer groups. The program was created under the authority of the Medicare Prescription Drug, Improvement, and Modernization Act of 2003, which formally mandated public reporting of hospital quality measures as a condition of receiving the full annual payment update under the Inpatient Prospective Payment System. Hospitals that did not report quality data faced a 0.4% reduction in their annual payment update — a modest penalty that nonetheless drove near-universal participation.

The program covers approximately 5,100 facilities across several hospital categories: acute care hospitals (the largest group, including academic medical centers, community hospitals, and specialty hospitals), critical access hospitals (designated rural facilities with 25 or fewer beds), long-term care hospitals (which treat patients requiring more than 25 days of inpatient care), psychiatric hospitals, and children's hospitals. Each category has its own set of applicable measures and reporting requirements; acute care hospitals face the most extensive reporting obligations and are the primary focus of the star rating system. The Hospital Compare data is distinct from the Nursing Home Compare program (which covers skilled nursing facilities) and the Home Health Compare program (which covers home health agencies), though all three are now accessible through the unified Care Compare portal at medicare.gov/care-compare.

The underlying datasets are published at data.cms.gov/provider-data/topics/hospitals/, where they are available as bulk CSV downloads and through the Socrata API with no registration or API key required. The primary linkage identifier across all Hospital Compare datasets is the Provider Number, also called the CMS Certification Number (CCN) — a six-character identifier assigned to each Medicare-certified provider. For hospitals, the CCN is formatted as two state-code digits followed by four facility-type and sequence digits. The CCN is stable across time and links Hospital Compare data to the CMS Medicare inpatient payment data, cost reports in the Healthcare Cost Report Information System (HCRIS), and the CMS Open Payments (Sunshine Act) dataset.

CMS updates the Hospital Compare datasets on a quarterly schedule, with different measure groups refreshing on different cadences. HCAHPS patient survey data is published quarterly. Readmission and mortality measure data updates annually, typically using a three-year rolling window of Medicare claims. Process-of-care and structural measures update on varying schedules depending on the data source. The Overall Hospital Quality Star Rating is recalculated annually using data from the most recently completed full annual cycle for each measure group.

The Overall Hospital Quality Star Rating

CMS introduced the Overall Hospital Quality Star Rating in July 2016 after several years of development and public comment. The rating system was designed to provide consumers and referring physicians with a single summary metric that synthesizes performance across multiple quality domains, analogous to the Five-Star system introduced for nursing homes in 2008. The star rating runs from 1 to 5, where 1 star indicates performance much below the national average and 5 stars indicates performance much above average. Approximately 30% of hospitals receive 3 stars; the distribution is weighted toward the middle, with roughly equal proportions at 1 and 5 stars.

The composite rating is built from five measure groups, each with a different weight in the final calculation:

Measure group	Weight	What it covers
Mortality	22%	30-day mortality for AMI, HF, pneumonia, COPD, CABG, hip/knee arthroplasty
Safety of Care	22%	PSI-90 composite, HAI measures (CLABSI, CAUTI, SSI, MRSA, C. diff)
Readmission	22%	30-day unplanned readmission rates for AMI, HF, pneumonia, COPD, hip/knee, CABG, and a hospital-wide measure
Patient Experience (HCAHPS)	22%	HCAHPS survey responses across 10 domains of patient-reported experience
Timely & Effective Care	12%	Process-of-care measures: ER throughput times, sepsis treatment bundles, imaging efficiency

CMS uses a latent variable model (specifically, a confirmatory factor analysis approach) to generate standardized scores within each measure group, then combines the group scores using the weights above. Hospitals must report a minimum number of measures in at least three of the five groups to receive an overall star rating; hospitals below this threshold receive a “Not Available” designation. The minimum reporting requirement affects a meaningful minority of hospitals, particularly small critical access hospitals that may not have sufficient volume in specific procedure categories to generate statistically reliable measures.

The star rating has been controversial since its introduction. Safety-net hospitals, which serve disproportionately high shares of Medicaid patients, uninsured patients, and patients with complex social determinants of health, tend to receive lower star ratings on average than hospitals serving more affluent populations. Academic medical centers and major teaching hospitals — which accept the most complex referrals and treat conditions that other hospitals decline — similarly tend to score below their smaller community hospital peers. Rural hospitals face challenges in achieving statistically reliable sample sizes for some measures. CMS has responded with multiple methodology revisions since 2016, including adjustments to the risk-adjustment models for some measures and changes to the grouping methodology, but the correlation between lower star ratings and higher social risk remains a persistent feature of the data.

HCAHPS: The Patient Experience Survey

The Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) is the first nationally standardized, publicly reported patient satisfaction survey for hospitals in the United States. Developed jointly by CMS and the Agency for Healthcare Research and Quality (AHRQ), HCAHPS was endorsed by the National Quality Forum in 2005 and became mandatory for acute care hospitals participating in IPPS in 2007. Before HCAHPS, hospitals conducted patient satisfaction surveys using proprietary instruments that varied in content, administration methodology, and sampling approach, making cross-hospital comparisons impossible.

The HCAHPS survey instrument contains 29 questions covering 10 domains: Communication with Doctors, Communication with Nurses, Responsiveness of Hospital Staff, Communication about Medicines, Discharge Information, Care Transitions, Cleanliness of Hospital Environment, Quietness of Hospital Environment, Overall Rating of Hospital (a single item on a 0–10 scale), and Willingness to Recommend the Hospital. Each hospital administers HCAHPS by surveying a random sample of adult inpatients with overnight stays during each quarter, using one of four approved administration modes: mail, telephone, mixed (mail with telephone follow-up), or active interactive voice response. Survey vendors certified by CMS administer the survey on behalf of hospitals; the instrument, sampling methodology, and administration protocols are standardized to allow direct comparison.

Each hospital collects approximately 7,000 to 8,000 completed patient surveys per year, though the actual count varies substantially by facility size. Large urban hospitals may collect 15,000 or more surveys annually; small critical access hospitals may collect fewer than 300, which creates statistical reliability challenges for small hospitals. CMS applies a minimum sample size threshold before publicly reporting HCAHPS results for a given hospital; facilities below the minimum receive a “Not Available” designation for affected measures.

CMS publishes HCAHPS results using two primary reporting formats. The “top-box” percentage reports the share of patients who selected the most favorable response (“Always” on frequency items or 9–10 on the 0–10 overall rating). The linear mean score converts responses to a 0–100 scale for use in value-based purchasing calculations, as it is more sensitive to score differences across the full range than the top-box format alone. The HCAHPS Patient Experience star rating — separate from the Overall Hospital Quality Star Rating — summarizes the HCAHPS results into a 1-to-5-star score that appears on the Care Compare website alongside the composite star rating.

HCAHPS results are published quarterly, with each quarterly release covering a rolling 12-month period. The quarterly cadence means that a hospital's publicly reported HCAHPS scores reflect patient experiences from the preceding four quarters rather than the current quarter, introducing an inherent lag between changes in hospital performance and public reporting. HCAHPS scores are case-mix adjusted for patient demographic and health status factors (age, education level, self-reported health status, service line, mode of survey administration, and whether a proxy respondent completed the survey) to allow fair comparisons across hospitals with different patient populations.

HCAHPS scores are directly tied to hospital reimbursement through the Value-Based Purchasing program. The Patient Experience domain in VBP uses the HCAHPS linear mean scores across eight HCAHPS dimensions; hospitals receive an achievement score (absolute performance against national benchmarks) and an improvement score (improvement from a prior baseline period), with VBP using the higher of the two. The Patient Experience domain accounts for 25% of the total VBP performance score.

Readmissions, Mortality, and the HRRP

The Hospital Readmissions Reduction Program (HRRP), enacted as part of the Affordable Care Act and effective October 1, 2012, penalizes hospitals with excess 30-day unplanned readmission rates for specific conditions. The program uses Medicare claims data to calculate each hospital's observed readmission rate for each covered condition and compares it to the expected rate given the hospital's patient case mix. Hospitals with excess readmissions — observed rates above the risk-adjusted expected rate — face a reduction in their base Medicare DRG payments for all Medicare inpatient discharges, not just for the covered conditions.

The HRRP currently covers six condition and procedure categories: acute myocardial infarction (AMI / heart attack), heart failure (HF), pneumonia, chronic obstructive pulmonary disease (COPD), elective primary total hip arthroplasty and/or total knee arthroplasty (THA/TKA), and coronary artery bypass graft surgery (CABG). The maximum payment penalty is 3% of the hospital's base Medicare DRG payment rate; a hospital at the maximum penalty loses 3 cents of every Medicare dollar on every discharge across all DRGs, not just the six covered conditions. CMS estimated that the HRRP reduced Medicare readmission-related costs by approximately $10 billion between 2013 and 2022 across all hospitals subject to the program.

The readmission measures used in Hospital Compare and HRRP are calculated from Medicare fee-for-service claims using a 30-day window: a patient is counted as readmitted if they are discharged from the index hospitalization and admitted to any acute care hospital within 30 days for any cause that is not a planned readmission. CMS risk-adjusts each readmission measure for patient age, sex, and comorbidities recorded in Medicare claims prior to the index admission. The measures are reported as standardized ratios (observed-to-expected ratio) and as risk-adjusted readmission rates per 1,000 discharges.

Parallel 30-day mortality measures cover: AMI, heart failure, pneumonia, COPD, CABG, and hip/knee arthroplasty. These are calculated and risk-adjusted using the same methodology as the readmission measures. Mortality measures count deaths from any cause within 30 days of hospital admission, regardless of whether the death occurs in the hospital or after discharge. Because the measures follow Medicare patients for 30 days regardless of discharge setting, a hospital that discharges patients to hospice has those patients' deaths counted in its mortality measure, which has been a persistent methodological criticism of the program.

Patient Safety: PSI-90 and HAC Reduction

The Safety of Care measure group in the star rating draws on two distinct data systems: the AHRQ Patient Safety Indicator composite (PSI-90) derived from inpatient claims, and Healthcare-Associated Infection (HAI) measures reported through the CDC's National Healthcare Safety Network (NHSN).

PSI-90, the Patient Safety and Adverse Events Composite, aggregates eight individual Patient Safety Indicators drawn from Medicare claims: pressure ulcers (Stage III or IV), postoperative hemorrhage or hematoma requiring intervention, postoperative pulmonary embolism or deep vein thrombosis, postoperative respiratory failure, postoperative wound dehiscence, iatrogenic pneumothorax, accidental puncture or laceration, and central venous catheter-related bloodstream infection. Each component indicator identifies potentially preventable complications using ICD-10-CM/PCS codes in Medicare inpatient claims. PSI-90 is weighted as a composite using relative weights derived from the estimated preventability and burden of each condition.

The HAI measures reported through NHSN cover: central line-associated bloodstream infections (CLABSI), catheter-associated urinary tract infections (CAUTI), surgical site infections (SSI) following colon surgery and abdominal hysterectomy, methicillin-resistant Staphylococcus aureus(MRSA) bacteremia, and Clostridioides difficile infections. Each is expressed as a Standardized Infection Ratio (SIR), comparing observed infections to the number expected given the hospital's patient population and procedure volume based on NHSN national baseline data. SIRs below 1.0 indicate better-than-expected performance; SIRs above 1.0 indicate worse-than-expected.

The HAC Reduction Program, separate from but related to Hospital Compare, penalizes hospitals in the worst-performing quartile on healthcare-acquired conditions with a 1% reduction in all Medicare payments for the applicable fiscal year. The HAC score is a composite of the Total HAC Score, calculated from the NHSN HAI measures and the PSI-90 measure. Hospitals in the bottom 25% of the national HAC score distribution receive the 1% penalty; there is no exception for small or rural hospitals, though critical access hospitals are excluded from the program. The HAC Reduction Program payment adjustment is applied before the Value-Based Purchasing adjustment, so a hospital can be subject to both penalties simultaneously.

Payment Data: MSPB and Value-Based Purchasing

Hospital Compare publishes two categories of payment-related data: the Medicare Spending per Beneficiary (MSPB) measure, which captures relative episode-level spending efficiency, and the Value-Based Purchasing (VBP) Total Performance Score and payment adjustment, which summarizes each hospital's overall performance across the VBP quality domains.

The MSPB measure calculates the ratio of Medicare spending attributable to a hospital for a standardized episode of care — defined as the period from three days before the inpatient admission through 30 days after discharge — compared to the national median spending for similar episodes. MSPB spending is price-standardized (adjusted for geographic wage index differences and other payment adjustments) and risk-adjusted for patient age, sex, and comorbidity burden. A hospital with an MSPB ratio below 1.0 is spending less than the national median for comparable episodes; an MSPB above 1.0 indicates above-median episode spending. The MSPB measure covers all inpatient discharges for Medicare fee-for-service beneficiaries and is not limited to specific condition categories.

The Inpatient Prospective Payment System (IPPS) base payment rates are adjusted by Disproportionate Share Hospital (DSH) adjustments for hospitals serving high proportions of low-income patients. DSH hospitals receive a base DSH percentage payment plus an additional “uncompensated care” adjustment funded by the difference between the empirically justified DSH amount and the base DSH payment. Large urban teaching hospitals with high Medicaid and uninsured patient shares receive DSH adjustments that can add several percentage points to their total Medicare payment rate; without these adjustments, safety-net hospitals would face severe financial pressure from the combination of lower commercial payer rates, higher uncompensated care burdens, and similar fixed operating costs.

The Value-Based Purchasing program, effective fiscal year 2013, redistributes a portion of total IPPS payments based on hospital performance scores. CMS withholds 2% of each hospital's base IPPS payments into a VBP pool, then redistributes the pooled funds back to hospitals based on their Total Performance Score (TPS). The TPS combines four domains: Clinical Outcomes (25%), Safety (25%), Efficiency and Cost Reduction (25%), and Person and Community Engagement (25%, which is primarily HCAHPS). A hospital with a TPS at or above the 50th percentile receives its withheld 2% back; a hospital scoring above the median receives more than it contributed; a hospital below the median receives less, effectively subsidizing higher-performing institutions. The net VBP payment adjustment ranges from a penalty of approximately −1% to a bonus of approximately 3% above the 2% withhold baseline, depending on TPS performance.

Data Access

CMS publishes Hospital Compare datasets through the Provider Data Catalog at data.cms.gov/provider-data/topics/hospitals/. All datasets are available as bulk CSV downloads and via the Socrata API endpoint at data.cms.gov/resource/. No API key is required; Socrata supports SoQL query parameters for filtering and pagination. The primary datasets and their contents are:

Dataset	Records	Key fields
General Information	~5,100 hospitals	CCN, hospital name, address, hospital type, ownership type, emergency services, phone number, county
Overall Hospital Quality Star Rating	~4,000 rated hospitals	CCN, overall star rating (1–5 or Not Available), count of measures reported per group
HCAHPS – Hospital Consumer Assessment	~4,800 hospitals	CCN, HCAHPS measure code, top-box %, linear mean score, number of completed surveys, footnote
Complications and Deaths (Unplanned Visits)	~4,500 hospitals × measures	CCN, measure ID, compared-to-national, score, denominator, footnote
Unplanned Hospital Visits (Readmissions)	~4,500 hospitals × measures	CCN, measure ID, excess readmission ratio, predicted rate, expected rate, number of discharges
Healthcare-Associated Infections	~5,000 hospitals × 6 HAIs	CCN, measure ID (CLABSI, CAUTI, SSI, MRSA, C. diff), SIR score, compared-to-national
Payment and Value of Care	~4,500 hospitals	CCN, MSPB measure, payment category (higher/average/lower than national), spend per beneficiary
Timely & Effective Care	~5,000 hospitals × measures	CCN, measure ID, score, denominator, sample size, footnote

The CCN (Provider Number) is the join key across all Hospital Compare datasets and also links to the CMS Medicare Inpatient Provider Charge data, the CMS Medicare Outpatient Provider Charge data, the HCRIS cost reports, and the CMS Open Payments (Sunshine Act) dataset. When joining Hospital Compare data to IPPS payment data, confirm that the CCN format is consistent across datasets (some files use a leading-zero six-digit format; others may omit leading zeros). Converting all CCNs to zero-padded six-digit strings before joining prevents silent mismatches.

Socrata dataset identifiers for Hospital Compare datasets change when CMS publishes new major releases. If the endpoint URLs in the Python script below return 404 errors, navigate to data.cms.gov/provider-data/topics/hospitals/, locate the current version of the relevant dataset, and copy the updated CSV download URL from the dataset page. Column names are generally stable across quarterly releases; the script uses flexible string matching to detect column names and handle minor naming variations.

Python: Analyzing CMS Hospital Compare Data

The following script downloads the CMS Hospital General Information and Overall Star Rating datasets from data.cms.gov, merges them on the CCN Provider Number, computes star-rating distributions by hospital type and ownership category, identifies the top 20 five-star hospitals, and calculates the average HCAHPS overall rating score by star tier. The script requires requests and pandas; no API key is needed.

import requests
import pandas as pd
import io

# ---------------------------------------------------------------------------
# CMS Hospital Compare: General Information + Overall Star Rating datasets
# Source: data.cms.gov Socrata API (no API key required)
# ---------------------------------------------------------------------------
# The CMS Provider Data Catalog publishes Hospital Compare datasets as
# CSV downloads. The two core datasets below share the CCN (CMS
# Certification Number / Provider Number) as the join key.

# General Information -- hospital name, address, type, ownership
GENERAL_INFO_URL = (
    "https://data.cms.gov/provider-data/api/1/datastore/query/"
    "xubh-q36u/0/download?format=csv"
)

# Overall Hospital Quality Star Rating -- composite 1-5 star score
STAR_RATING_URL = (
    "https://data.cms.gov/provider-data/api/1/datastore/query/"
    "dgck-syfz/0/download?format=csv"
)

# HCAHPS Patient Survey results
HCAHPS_URL = (
    "https://data.cms.gov/provider-data/api/1/datastore/query/"
    "dgck-syfz/0/download?format=csv"
)

print("Downloading CMS Hospital General Information...")
resp_info = requests.get(GENERAL_INFO_URL, timeout=120)
resp_info.raise_for_status()
df_info = pd.read_csv(io.StringIO(resp_info.text), low_memory=False)
print(f"General Information: {len(df_info):,} facilities, {df_info.shape[1]} columns")

print("Downloading CMS Hospital Overall Star Rating...")
resp_star = requests.get(STAR_RATING_URL, timeout=120)
resp_star.raise_for_status()
df_star = pd.read_csv(io.StringIO(resp_star.text), low_memory=False)
print(f"Star Rating dataset: {len(df_star):,} rows, {df_star.shape[1]} columns")

# ---------------------------------------------------------------------------
# Merge on Provider Number (CCN)
# Column names vary by quarterly release; detect flexibly
# ---------------------------------------------------------------------------
id_col_info = [c for c in df_info.columns
               if "provider" in c.lower() and ("number" in c.lower() or "id" in c.lower())]
id_col_star = [c for c in df_star.columns
               if "provider" in c.lower() and ("number" in c.lower() or "id" in c.lower())]

if not id_col_info or not id_col_star:
    raise ValueError("Could not locate Provider Number column. "
                     "Check column names: " + str(df_info.columns[:10].tolist()))

id_col_info = id_col_info[0]
id_col_star = id_col_star[0]

# Star rating column
star_col = [c for c in df_star.columns if "hospital_overall_rating" in c.lower()
            or ("overall" in c.lower() and "rating" in c.lower())]
star_col = star_col[0] if star_col else "hospital_overall_rating"

# Hospital type and ownership columns in general info
type_col  = [c for c in df_info.columns if "hospital_type" in c.lower()]
own_col   = [c for c in df_info.columns if "hospital_ownership" in c.lower()
             or "ownership" in c.lower()]
type_col  = type_col[0]  if type_col  else None
own_col   = own_col[0]   if own_col   else None

# Merge datasets
df_star_trimmed = df_star[[id_col_star, star_col]].copy()
df_star_trimmed.columns = ["provider_id", "overall_star_rating"]
df_info_trimmed = df_info.rename(columns={id_col_info: "provider_id"})
df = df_info_trimmed.merge(df_star_trimmed, on="provider_id", how="inner")
print(f"\nMerged dataset: {len(df):,} hospitals with star rating data")

# Coerce star rating to numeric (some cells may contain "Not Available")
df["overall_star_rating"] = pd.to_numeric(df["overall_star_rating"], errors="coerce")

# ---------------------------------------------------------------------------
# Part 1: Star-rating distribution by hospital type
# ---------------------------------------------------------------------------
STAR_LABEL = {
    1: "1-star (much below avg)",
    2: "2-star (below avg)",
    3: "3-star (average)",
    4: "4-star (above avg)",
    5: "5-star (much above avg)",
}

print("\n=== Overall Star Rating Distribution (all hospitals) ===")
overall_dist = (
    df["overall_star_rating"].dropna().astype(int)
    .value_counts().sort_index()
)
total_rated = overall_dist.sum()
print(f"  {\'Stars\':<28}  {\'Count\':>7}  {\'Share\':>7}")
print("  " + "-" * 46)
for stars, count in overall_dist.items():
    pct = count / total_rated * 100
    bar = "#" * int(pct / 2)
    print(f"  {STAR_LABEL.get(stars, str(stars)):<28}  {count:>7,}  {pct:>6.1f}%  {bar}")
print(f"  {\'TOTAL\':<28}  {total_rated:>7,}")

if type_col:
    print("\n=== Average Star Rating by Hospital Type ===")
    type_avg = (
        df.dropna(subset=["overall_star_rating"])
        .groupby(type_col)["overall_star_rating"]
        .agg(["mean", "count"])
        .rename(columns={"mean": "avg_stars", "count": "n_hospitals"})
        .sort_values("avg_stars", ascending=False)
    )
    print(f"  {\'Hospital Type\':<45}  {\'Avg Stars\':>10}  {\'N Hospitals\':>12}")
    print("  " + "-" * 72)
    for htype, row in type_avg.iterrows():
        print(f"  {str(htype):<45}  {row[\'avg_stars\']:>10.2f}  {int(row[\'n_hospitals\']):>12,}")

# ---------------------------------------------------------------------------
# Part 2: Star-rating distribution by ownership type
# ---------------------------------------------------------------------------
if own_col:
    print("\n=== Average Star Rating by Ownership Type ===")
    own_avg = (
        df.dropna(subset=["overall_star_rating"])
        .groupby(own_col)["overall_star_rating"]
        .agg(["mean", "count"])
        .rename(columns={"mean": "avg_stars", "count": "n"})
        .sort_values("avg_stars", ascending=False)
    )
    print(f"  {\'Ownership\':>40}  {\'Avg Stars\':>10}  {\'N\':>6}")
    print("  " + "-" * 62)
    for own, row in own_avg.iterrows():
        print(f"  {str(own):>40}  {row[\'avg_stars\']:>10.2f}  {int(row[\'n\']):>6,}")

# ---------------------------------------------------------------------------
# Part 3: Top 20 hospitals with 5-star ratings
# ---------------------------------------------------------------------------
five_star = df[df["overall_star_rating"] == 5].copy()
print(f"\n=== 5-Star Hospitals: {len(five_star):,} total ===")

name_col  = [c for c in df.columns if "facility_name" in c.lower() or "hospital_name" in c.lower()]
state_col = [c for c in df.columns if c.lower() in ("state", "provider_state", "location_state")]
city_col  = [c for c in df.columns if c.lower() in ("city", "provider_city")]
name_col  = name_col[0]  if name_col  else None
state_col = state_col[0] if state_col else None
city_col  = city_col[0]  if city_col  else None

display_cols = [c for c in [name_col, city_col, state_col, type_col, own_col] if c]
if display_cols:
    print("\nTop 20 five-star hospitals (alphabetical by state, then name):")
    sort_cols = [c for c in [state_col, name_col] if c]
    top20 = five_star[display_cols].sort_values(sort_cols).head(20)
    print(top20.to_string(index=False))

# ---------------------------------------------------------------------------
# Part 4: Average HCAHPS linear mean score vs. overall star rating
# Fallback: use df_star for HCAHPS summary ratings if present in same file
# ---------------------------------------------------------------------------
hcahps_col = [c for c in df_star.columns
              if "hcahps" in c.lower() and "linear" in c.lower()]
if not hcahps_col:
    hcahps_col = [c for c in df_star.columns
                  if "patient_experience" in c.lower() and "rating" in c.lower()]

if hcahps_col:
    hcahps_col = hcahps_col[0]
    print(f"\n=== HCAHPS Score vs. Overall Star Rating (using \'{hcahps_col}\') ===")
    df_hcahps = df_star[[id_col_star, star_col, hcahps_col]].copy()
    df_hcahps.columns = ["provider_id", "overall_star_rating", "hcahps_score"]
    df_hcahps["overall_star_rating"] = pd.to_numeric(
        df_hcahps["overall_star_rating"], errors="coerce")
    df_hcahps["hcahps_score"] = pd.to_numeric(
        df_hcahps["hcahps_score"], errors="coerce")
    grouped = (
        df_hcahps.dropna()
        .groupby("overall_star_rating")["hcahps_score"]
        .agg(["mean", "median", "count"])
        .round(2)
    )
    print(f"  {\'Stars\':>6}  {\'Avg HCAHPS\':>12}  {\'Median\':>8}  {\'N\':>6}")
    print("  " + "-" * 40)
    for stars, row in grouped.iterrows():
        print(f"  {int(stars):>6}  {row[\'mean\']:>12.2f}  "
              f"{row[\'median\']:>8.2f}  {int(row[\'count\']):>6,}")
else:
    print("\nHCAHPS linear mean not found in this dataset release.")
    print("Download the separate HCAHPS dataset from data.cms.gov/provider-data.")

print("\nDone. Source: CMS Hospital Compare via data.cms.gov/provider-data")
print("Care Compare public portal: https://www.medicare.gov/care-compare/")

The HCAHPS linear mean score portion of the script will report “not found” if the star rating dataset does not include the HCAHPS summary column in the current quarterly release. In that case, download the dedicated HCAHPS dataset separately from data.cms.gov/provider-data/topics/hospitals/, filter for the measure code corresponding to Overall Hospital Rating (H_HSP_RATING), and join to the star rating data on CCN before running the groupby analysis.

Part of the Federal Regulatory Data Hub.