Healthcare Consolidation: Tracing Hospital and Nursing-Home Roll-Ups Through Federal Data

Consolidation is the quiet, defining force in American healthcare—a chain buys a nursing home, a private-equity fund rolls up a hundred hospices, a REIT takes the real estate out from under a hospital system—and almost none of it shows up on the sign outside. But it leaves a trace. CMS records who owns a facility, the transactions that change that ownership, and the staffing and outcomes that follow. The trouble is that those records live in separate files that do not share a key, so the abstract trend of “consolidation” only becomes a traceable, facility-level story once you bridge them. This article is the field guide to that bridge.

This article covers the four CMS datasets that anchor the work—the provider-ownership files, the hospital all-owners file, the change-of-ownership (CHOW) transactions, and the Care Compare quality datasets; the central question consolidation research exists to answer, which is what happens to quality after a chain or a private-equity firm takes over; the join problem at the heart of the task, namely that the quality data is keyed by the CMS Certification Number while the ownership and CHOW data is keyed by PECOS enrollment and associate IDs that carry no CCN; the enrollment-to-CCN crosswalk that bridges the two halves; owner-name normalization, the second hard problem, which collapses layered holding companies into a common parent; how to assemble a single facility's ownership and deal history and attach its quality and staffing through the bridge; how to roll facilities up to a common parent to measure a chain's footprint against its average quality; a worked Python workflow over the genuine data.cms.gov API; and the caveats—self-reporting, broken chains, name fragmentation, and CCN reuse—that every analyst must internalize before drawing conclusions.

The four datasets that anchor the work

Four CMS records, all public and key-free on data.cms.gov, carry the raw material for any serious account of healthcare consolidation. The first is the provider-ownership files: the individuals and organizations with an ownership or control interest in skilled nursing facilities, home health agencies, and hospices, published under the disclosure rules at 42 CFR 455.104. The second is the hospital all-owners file, the parallel record for acute-care hospitals. The third is the change-of-ownership, or CHOW, records—the sale and transfer transactions themselves, the moments at which control of a facility passes from one party to another. The fourth is the Care Compare quality datasets: the Five-Star ratings, the payroll-based staffing levels, and the outcome measures that CMS publishes for nursing homes and hospitals. Each answers a different question—who owns it, how the ownership changed, and how the facility performs—and consolidation analysis needs all four at once.

In our catalog these are stored, already parsed and keyed, as cms_provider_ownership (the SNF, home-health, and hospice owners), cms_hospital_owners (acute-care hospital ownership), cms_provider_chow (the change-of-ownership transactions), and cms_quality (the skilled-nursing and hospital Care Compare measures). The value of having them pre-parsed and keyed is that the work shifts. It is no longer the drudgery of parsing four heterogeneous CMS files; it becomes the genuinely hard and interesting part—the cross-walk that ties enrollment IDs to CCNs, and the owner-name normalization that collapses a sponsor's layered single-purpose entities back into a common parent. Those two operations are the whole game, and the rest of this article is about them.

The question consolidation research exists to answer

There is a reason to do this work, and it is the central question in health-policy research on the sector: what happens to quality after a private-equity firm or a large chain takes over a facility? Consolidation is not, in itself, a verdict—a well-capitalized operator can rescue a failing facility, and scale can fund better systems. But the accumulated evidence on financialized post-acute care points in a consistent and troubling direction, and the question only yields to data that connects the owner to the outcome.

The landmark work is the National Bureau of Economic Research study by Atul Gupta, Sabrina Howell, Constantine Yannelis, and Abhinav Gupta, which matched private-equity nursing-home deals to resident-level Medicare data and found that going to a private-equity-owned nursing home was associated with a statistically significant increase in short-term mortality, on the order of roughly ten percent relative to comparison facilities, alongside declines in frontline nurse staffing and patterns of charges consistent with cost-cutting after a buyout. The mechanism the literature points to is staffing: the largest controllable cost in a nursing home is nurse labor, the pressure to service acquisition debt and pay rent and management fees falls on the operating margin, and lower nurse staffing is independently one of the most robust predictors of worse resident outcomes in all of long-term care. The ownership change, the staffing decline, and the outcome decline are three links in one chain.

The decisive methodological fact is that no single dataset shows this. The ownership file knows who owns the facility but not how it performs. The quality file knows the staffing and the star rating but not who is behind the door. The CHOW file knows a deal happened but nothing about its consequences. The finding—the link from ownership to staffing to outcome—exists only in the join. Linking ownership to staffing and outcomes is not an optional refinement; it is the method itself, the mechanism by which the documented post-acquisition effects were established in the first place. The chain of owner to facility to staffing to outcome is the analysis.

The CCN, the enrollment ID, and why they do not match

The reason this is hard—the crux of the whole exercise—is that the two halves of the story are keyed by different identifiers that do not appear together in either half. The quality datasets are keyed by the CMS Certification Number (CCN), the six-digit Medicare provider number that identifies a certified facility for survey, certification, and payment. The CCN is the identifier the Care Compare files, the survey histories, and the payroll-based staffing data all hang on. It is facility-scoped: it answers the question “which building, which certified provider.”

The ownership files and the CHOW files, by contrast, are keyed by the PECOS enrollment ID and associate ID—the identifiers from the Provider Enrollment, Chain, and Ownership System, the administrative spine of Medicare enrollment. The enrollment ID identifies a specific enrollment record; the associate ID identifies a party—a provider organization or an owner—across its enrollments. These are the identifiers around which ownership disclosure and ownership transfer are organized, because enrollment is where ownership is collected. And critically—this is the entire difficulty—the ownership and CHOW files do not carry a CCN. You cannot join an owner directly to a star rating, because the owner is described in enrollment-ID space and the rating is described in CCN space, and neither file contains the other's key.

The consequence is concrete and unavoidable. An analyst who downloads the SNF all-owners file and the nursing-home Care Compare file and tries to merge them will find no column in common. The owner of a facility, expressed as a row in enrollment-ID space, simply cannot be placed next to that facility's staffing and outcomes, expressed in CCN space, by any direct operation. The two datasets are about the same facilities—but they describe those facilities in two different languages, and there is no shared word between them. Until that gap is bridged, the ownership-to-outcome question that motivates the entire effort is literally uncomputable.

The enrollment file: the bridge that makes the join possible

The bridge is the CMS enrollment file, and it works for one reason: it carries both keys. The enrollment record for a facility holds its PECOS enrollment ID and associate ID—the key the ownership and CHOW files speak in —and it also holds the facility's CCN, the key the quality files speak in. Because the enrollment file is the one place the two identifiers sit on the same row, it is the translation layer. Reduced to its essence, it is a crosswalk: a table of (enrollment ID, CCN) pairs that lets you convert any enrollment-keyed fact into a CCN-keyed fact and back again.

With that crosswalk in hand, the end-to-end join becomes mechanical. You start from the ownership file, which gives you, for each facility, its owners in enrollment-ID space. You join that file to the enrollment crosswalk on the enrollment ID, which appends the CCN to every ownership row. Now the ownership rows carry a CCN, and they can be joined to the Care Compare quality file on that CCN, which appends the star rating, the staffing level, and the outcome measures. The same chain works for the CHOW file: a change-of-ownership transaction, keyed by enrollment ID, gains a CCN through the crosswalk and can then be placed in time against the facility's quality trajectory. The enrollment-to-CCN crosswalk is what makes the end-to-end join possible—it is the single component without which owner, deal, staffing, and outcome remain four disconnected tables.

The columns below show the shape of the records this join touches—the enrollment-keyed ownership and CHOW side, the bridge itself, and the CCN-keyed quality side—and which key each one carries:

-- cms_provider_ownership / cms_hospital_owners  (PECOS-keyed)
enrollment_id            -- PECOS enrollment id for the facility
associate_id             -- PECOS associate id for the facility / owner
organization_name        -- legal name of the enrolled provider
owner_name               -- owner legal/individual name (free text)
role_code_owner          -- direct / indirect interest, control, mgmt
percentage_ownership     -- disclosed stake, where applicable
type_owner_pe            -- Y/N: owner is a private equity company
type_owner_reit          -- Y/N: owner is a REIT
created_for_acquisition  -- Y/N: entity spun up to hold an acquisition
(NO ccn column on this file)

-- cms_provider_chow  (PECOS-keyed transactions)
enrollment_id            -- the enrollment whose ownership changed
chow_type                -- sale, transfer, merger, consolidation
chow_effective_date      -- date control passed
buyer / seller party ids -- associate ids of the parties

-- enrollment file  (THE BRIDGE -- carries BOTH keys)
enrollment_id            -- joins to ownership / chow
ccn                      -- joins to quality

-- cms_quality (Care Compare)  (CCN-keyed)
ccn                      -- CMS Certification Number (Medicare provider no.)
overall_rating           -- Five-Star overall rating
total_nurse_staffing_hppd-- payroll-based staffing hours per resident day
outcome_measures...      -- rehospitalization, deficiencies, etc.
(NO enrollment_id on this file)

Owner-name normalization: collapsing the holding companies

The enrollment-to-CCN bridge solves the join between the two halves; the second hard problem lives entirely on the ownership side, and it is owner-name normalization. The structure of financialized healthcare ownership is deliberately layered: a single sponsor controls a facility not under one name but through a stack of single-purpose entities—an operating company, a property company, a management-services company, and a chain of holding companies, each typically an LLC named for the facility or the deal. The whole point of the structure is to put distance between the named provider and the ultimate owner, which means the same real-world parent appears in the file under dozens of distinct strings.

To measure a chain's footprint, you have to undo that fragmentation—to collapse the layered holding companies into a common parent. A naive grouping on the raw owner name will systematically undercount exactly the largest and most sophisticated owners, the ones that use the most entities and are therefore the most worth measuring. Normalization is a spectrum of effort. The cheap first pass strips entity suffixes and punctuation so that “ABC Capital LLC,” “ABC Capital, L.L.C.,” and “ABC Capital” land on one key. The next level blocks on shared addresses, because the single-purpose entities of one sponsor often share a corporate headquarters or registered-agent address even when their names differ. The most rigorous level traces the indirect-ownership role codes and holding-company flags upward, following the chain layer by layer until it reaches the entity at the apex— typically the private-equity fund or the REIT—and assigns every facility beneath it to that parent.

The combination of the two operations is what turns four files into a chain-level dataset. The bridge attaches quality to ownership; normalization attaches ownership to a parent. Only after both have run can you ask the footprint-versus-quality question—how many facilities does this parent control, and what is their average staffing and star rating—because only then is “this parent” a coherent entity with a known set of facilities and each facility a known set of outcomes. Get the normalization wrong and a thousand-facility sponsor dissolves into a thousand one-facility owners, and the consolidation it represents disappears from the data.

Assembling a single facility's history

The most instructive way to learn the cross-walk is to run it on one facility before scaling it to a chain. Pick a nursing home by its CCN—the number Care Compare lists it under—and the goal is to assemble its complete federal story: who owns it, the deal that put them in control, and how it has performed.

Because you start from a CCN but the ownership files are enrollment-keyed, the first step runs the bridge in reverse: look up the facility in the enrollment crosswalk to translate its CCN into its PECOS enrollment ID. With the enrollment ID, you query the ownership file for every disclosed owner of that enrollment—the operating company, the property company, the management company, the holding companies, and, if present, the private-equity or REIT entity at the top, each with its role code, its percentage stake where disclosed, and its entity-type flags. That is the ownership snapshot. Next you query the CHOW file for the same enrollment ID, which returns the change-of-ownership transactions—the sales, transfers, mergers, and consolidations —each with an effective date. Ordered in time, the CHOW rows are the facility's transaction history: the moments its control changed hands.

Finally you attach the outcomes. Because you began from the CCN, the quality file joins directly—the Five-Star overall rating, the payroll-based nurse-staffing hours per resident day, the deficiency and rehospitalization measures—and, where CMS publishes the staffing and rating over time, you can line those measures up against the CHOW effective dates to see the trajectory of staffing and quality before and after the facility changed owners. That single assembled record—owner, deal, staffing, outcome, in time order—is the consolidation story for one facility, and it is the unit that, repeated and rolled up, becomes the story of a chain.

Rolling facilities up to a chain

The payoff of normalization is the rollup—the move from one facility to the parent that controls many. Once every ownership row carries both a CCN (through the bridge, so quality is attached) and a normalized parent key (through name normalization, so the layered entities are collapsed), you can group by the parent and compute, in a single pass, the two numbers that define a chain: its footprint—the count of distinct facilities it controls—and its average quality—the mean star rating and mean staffing across those facilities. Setting the footprint against the quality is the headline measure of consolidation analysis: it answers, for each parent, how large a slice of the nation's post-acute capacity it controls and how that capacity performs.

The rollup is also what makes cross-sector and cross-state patterns visible. A parent that appears across the SNF, home-health, and hospice ownership files—captured by normalizing owner identities across all three—is a sponsor that holds a patient across the continuum of post-acute care under common control, a structure invisible in any single file. Mapping a normalized parent's facilities by state turns an abstract “roll-up” into a footprint with geography, and weighting each facility by the population it serves turns it into an exposure: how many residents are in the hands of this particular owner, at this particular average level of staffing. The rollup is where the four datasets stop being a join exercise and become a measurement of market structure.

Change of ownership in time: the deal as an event

The CHOW file deserves its own emphasis, because it supplies the dimension the ownership snapshot lacks: time. The all-owners file is a photograph—who owns the facility as of the latest disclosure. The CHOW file is the sequence of transactions that produced that photograph: each row is a discrete change-of-ownership event, with a type (sale, transfer, merger, consolidation) and an effective date. Read together, the two files turn ownership from a static fact into a history.

The reason this matters for the central question is that the post-acquisition quality analysis is fundamentally an event study, and the CHOW effective date is the event. To ask whether staffing fell after a buyout, you need the date of the buyout to define “before” and “after,” and the CHOW file is the public record of that date. Bridged to the CCN and joined to the time series of payroll-based staffing, a CHOW transaction becomes the anchor of a before-and-after comparison at the facility level— and aggregated across the facilities a parent acquired, it becomes the basis for estimating the parent's effect on staffing and outcomes. The created-for-acquisition flag in the ownership file and the CHOW effective dates are complementary M&A signals: the flag marks the entity spun up to hold a purchase, the CHOW row dates the purchase, and together they let you reconstruct the tempo of a roll-up year by year.

Python workflow: the cross-walk from owner to outcome

The script below performs the full cross-walk over the genuine data.cms.gov API. It pages the SNF all-owners file and the CHOW file (both enrollment-keyed), builds the enrollment-to-CCN crosswalk from the enrollment file, uses that bridge to attach the CCN-keyed Care Compare star rating and staffing to every ownership row, normalizes owner names to a common parent key, and rolls facilities up to weigh each parent's footprint against its average quality. No API key is required for the public datasets. Dataset UUIDs are left as placeholders because CMS re-versions these files; resolve the current UUIDs from the catalog pages, and validate the CCN zero-padding and column names against the current release before relying on the counts.

import requests
import pandas as pd

# ---------------------------------------------------------------
# Tracing one chain through four CMS datasets.
# All sources are public and key-free on data.cms.gov.
#
# The crux is that the four files do not share a join key:
#   - Care Compare quality datasets are keyed by the CCN
#     (CMS Certification Number, the Medicare provider number)
#   - the all-owners and change-of-ownership (CHOW) files are
#     keyed by PECOS enrollment_id / associate_id and carry NO CCN
# The CMS enrollment file carries BOTH the enrollment_id and the
# CCN, so it is the bridge that makes the end-to-end join possible.
#
# This script:
#   1. Pulls the SNF all-owners and CHOW files (enrollment-keyed)
#   2. Builds the enrollment_id -> CCN crosswalk from the
#      enrollment file
#   3. Attaches Care Compare star rating and staffing (CCN-keyed)
#   4. Normalizes owner names and rolls facilities up to a common
#      parent to weigh a chain’s footprint against its quality
# ---------------------------------------------------------------

API = "https://data.cms.gov/data-api/v1/dataset/{uuid}/data"

# Resolve current dataset UUIDs from the catalog if a request 404s;
# CMS re-versions these files periodically.
UUIDS = {
    "snf_owners": "REPLACE_WITH_SNF_ALL_OWNERS_UUID",
    "snf_chow": "REPLACE_WITH_SNF_CHOW_UUID",
    "enrollment": "REPLACE_WITH_SNF_ENROLLMENT_UUID",
    "quality": "REPLACE_WITH_PROVIDER_INFO_UUID",
}


def fetch(name, page_size=5000):
    rows, offset = [], 0
    url = API.format(uuid=UUIDS[name])
    while True:
        page = requests.get(url, params={"size": page_size,
                                         "offset": offset},
                            timeout=120).json()
        if not page:
            break
        rows.extend(page)
        if len(page) < page_size:
            break
        offset += page_size
    df = pd.DataFrame(rows)
    df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]
    return df


def pick(df, *names):
    for n in names:
        if n in df.columns:
            return n
    raise KeyError(names)


def normalize_owner(name):
    # Collapse layered holding companies toward a common parent:
    # strip entity suffixes and punctuation so "ABC Capital, L.L.C."
    # and "ABC Capital LLC" land on the same key. Production work
    # adds shared-address blocking; this is the naive first pass.
    if not isinstance(name, str):
        return ""
    out = name.upper()
    for sfx in [" L.L.C.", " LLC", " L.P.", " LP", " INC.", " INC",
                " CORP.", " CORP", " HOLDINGS", " HOLDING", ","]:
        out = out.replace(sfx, "")
    return " ".join(out.split())


owners = fetch("snf_owners")
chow = fetch("snf_chow")
enroll = fetch("enrollment")
quality = fetch("quality")

# --- Step 1: the enrollment_id -> CCN bridge -------------------
enr_id = pick(enroll, "enrollment_id", "associate_id")
ccn = pick(enroll, "ccn", "provider_ccn", "cms_certification_number")
bridge = enroll[[enr_id, ccn]].dropna().drop_duplicates()
print(f"Crosswalk pairs (enrollment_id -> CCN): {len(bridge):,}")

# --- Step 2: attach quality through the bridge -----------------
own_enr = pick(owners, "enrollment_id", "associate_id")
q_ccn = pick(quality, "cms_certification_number_(ccn)", "ccn",
             "federal_provider_number")
q_star = pick(quality, "overall_rating", "overall_star_rating")
q_staff = pick(quality, "reported_total_nurse_staffing_hours_per_resident_per_day",
               "adjusted_total_nurse_staffing_hours_per_resident_per_day",
               "total_nursing_staff_turnover")

owners = owners.merge(bridge, left_on=own_enr, right_on=enr_id, how="left")
q = quality[[q_ccn, q_star, q_staff]].rename(
    columns={q_ccn: ccn, q_star: "stars", q_staff: "staffing"})
q[ccn] = q[ccn].astype(str).str.zfill(6)
owners[ccn] = owners[ccn].astype(str).str.zfill(6)
owners = owners.merge(q, on=ccn, how="left")

# --- Step 3: roll facilities up to a common parent ------------
own_name = pick(owners, "associate_id_owner", "organization_name_owner",
                "owner_name")
owners["parent"] = owners[own_name].map(normalize_owner)
owners["stars"] = pd.to_numeric(owners["stars"], errors="coerce")
owners["staffing"] = pd.to_numeric(owners["staffing"], errors="coerce")

chain = (owners[owners["parent"].ne("")]
         .groupby("parent")
         .agg(facilities=(own_enr, "nunique"),
              avg_stars=("stars", "mean"),
              avg_staffing=("staffing", "mean"))
         .reset_index()
         .sort_values("facilities", ascending=False))

print("\nLargest parents: footprint vs. average quality")
print(chain.head(20).to_string(index=False,
      formatters={"avg_stars": "{:.2f}".format,
                  "avg_staffing": "{:.2f}".format}))

Two practical notes apply. First, the join is only as good as the crosswalk: if the enrollment file's (enrollment ID, CCN) pairs are stale or incomplete, ownership rows will fail to pick up a CCN and silently drop out of the quality join, so the script should report the match rate—the share of ownership rows that successfully acquired a CCN—and treat a low rate as a data problem rather than a finding. Second, the normalization step in the script is the deliberately naive suffix-stripping pass; it is enough to demonstrate the rollup but not enough to trust the largest parents, which need the shared-address blocking and indirect-chain tracing described above. The structure of the script—bridge, attach, normalize, roll up—is the durable part; both the crosswalk freshness and the normalization depth are the levers a production pipeline tightens.

Limitations and analytical caveats

The four-dataset cross-walk is the most complete public method for connecting healthcare ownership to outcomes, but it rests on records with real limits, and an analyst must hold them in mind before drawing conclusions.

The ownership data is self-reported. The ownership and CHOW records originate in what the provider disclosed to CMS at enrollment, revalidation, and transfer. An owner who wishes to obscure a relationship has both incentive and, through the layered-entity structure, the means, and CMS does not independently audit every disclosure. The entity-type flags—including the private-equity and REIT flags that make the analysis possible—are likewise self-characterized, so a financial owner that does not describe itself as private equity may not be flagged. The files are the best available view of ownership, not a verified ledger.

The crosswalk can be imperfect, and the chain can break.The entire method depends on the enrollment file carrying a clean (enrollment ID, CCN) pair for every facility, and that mapping is not flawless: enrollments without a current CCN, facilities whose CCN changed, and timing mismatches between the file releases all produce ownership rows that fail to bridge to quality. Separately, the upward ownership chain—the indirect-interest records and holding-company flags—does not always give a clean parent-to-parent edge, so the ultimate beneficial owner can sit a layer beyond what was disclosed. A low crosswalk match rate or a broken chain is a property of the data, not a substantive result.

Owner-name fragmentation drives the headline numbers.Because a sponsor's facilities are held through many single-purpose entities, the count of facilities per parent is hostage to the quality of name-and-address normalization. Naive grouping undercounts the biggest owners; over-aggressive normalization can merge genuinely distinct owners that happen to share a generic name. Any footprint statistic should be reported alongside the normalization method that produced it, because two reasonable analysts using two normalization passes can arrive at materially different chain sizes from the same underlying file.

A CCN is a certification, not a permanent building, and the grain differs across files. A CCN can be reissued, retired, or transferred, and after a change of ownership the new operator may take over the existing CCN or obtain a new one—which complicates lining up a facility's quality history across a transaction and is exactly why the CHOW effective date matters as the anchor. The grains also differ: the ownership file is one row per provider-owner relationship, the CHOW file one row per transaction, the quality file one row per certified facility per reporting period. Every count must be explicit about whether it is counting relationships, deals, facilities, or owners; conflating them is the most common error in this work.

Held with those caveats, the four CMS datasets— cms_provider_ownership, cms_hospital_owners, cms_provider_chow, and cms_quality—are the authoritative, openly downloadable record of consolidation in American post-acute and institutional care. None of them shows the trend alone. Bridged through the enrollment-to-CCN crosswalk and collapsed through owner-name normalization, together they turn the abstraction of “consolidation” into something concrete: a named parent, a counted footprint, a dated deal, and a measured outcome—the chain of owner to facility to staffing to outcome, made traceable.

Related writing

CMS Provider Ownership: The Federal Database Behind Private Equity in Nursing Homes, Home Health, and Hospice — The all-owners files are the ownership half of this cross-walk, and that guide goes deep on the 42 CFR 455.104 schema, the entity-type flags, and the opco/propco/REIT chains that owner-name normalization has to collapse.

CMS Change of Ownership: The Federal Record of Hospital and Nursing-Home M&A — The CHOW transactions supply the dated deal that anchors the before-and-after quality comparison, and that guide details the transaction types and effective dates this analysis treats as the consolidation event.

CMS Skilled Nursing Facility Data: Star Ratings, Staffing, and the Quality Metrics Behind 15,000 Nursing Homes — The Care Compare quality datasets are the outcome half of the cross-walk, and that guide explains the Five-Star rating, the payroll-based staffing measure, and the CCN that the enrollment bridge ties the ownership records into.