Technical writing

CDC PLACES: The Small Area Health Estimates Behind County and Census Tract Disease Prevalence Data

· AI Analytics
CDCPLACESPublic HealthSmall Area EstimationFederal Data

Every county health department in the United States needs to know whether smoking rates in its jurisdiction are above or below the national average—and whether they vary by neighborhood. The BRFSS telephone survey that produces national behavioral risk estimates cannot answer that question for most counties: sample sizes are simply too small. CDC PLACES fills that gap, using a statistical modeling technique called multilevel regression and poststratification to estimate the prevalence of 36 health measures for every county, census tract, and ZIP code tabulation area in the country.

What CDC PLACES Is

CDC PLACES—Population Level Analysis and Community Estimates—is a collaboration between the Centers for Disease Control and Prevention, the Robert Wood Johnson Foundation, and the CDC Foundation. The program produces model-based small area estimates of health measures for all 3,100+ US counties, 29,000+ census tracts, and 28,000+ ZIP code tabulation areas (ZCTAs). It publishes annually, with the current release year reflecting BRFSS data from the prior survey cycle.

PLACES is the successor to the 500 Cities Project, which ran from 2016 through 2019 and covered only the 500 largest US cities at the census-tract level. When CDC expanded the program to national coverage in 2020, it renamed it PLACES to reflect the broader geographic scope. The underlying methodology—multilevel regression and poststratification, described in detail below—remained consistent through the transition, which means the 500 Cities data and PLACES data can be used together to construct a census-tract time series extending back to survey year 2014.

The program covers five categories of health measures: health outcomes (chronic diseases and conditions), prevention behaviors (screening and clinical services), unhealthy behaviors (risk factors), disability types, and—added in more recent releases—social determinants of health. Across those five categories, PLACES currently publishes 36 measures at the county level, with a subset available at census tract and ZCTA. All estimates are expressed as age-adjusted or crude prevalences, depending on the measure, and are accompanied by 95 percent confidence intervals that reflect both sampling uncertainty and model uncertainty.

Why It Exists: The BRFSS Sample Problem

The Behavioral Risk Factor Surveillance System is the primary source of state-level health behavior data in the United States. BRFSS conducts telephone surveys of adults in all 50 states and DC, collecting responses on a rotating battery of health topics—smoking, physical activity, health screenings, chronic disease diagnoses, and dozens of other measures. The survey reaches approximately 400,000 adults per year, making it one of the largest health surveys in the world.

At the national and state level, 400,000 respondents provide highly reliable estimates. The problem appears when analysts need county or sub-county estimates. The average US county has roughly 100,000 residents; after accounting for the fact that the survey targets adults and achieves only partial response rates, most counties end up with fewer than 100 BRFSS respondents per year. Many rural counties have fewer than 30. Direct survey estimates from samples that small carry 95 percent confidence intervals so wide as to be analytically useless—a smoking prevalence estimate of “22% (95% CI: 10%–34%)” cannot distinguish a county with a genuine smoking crisis from one with average rates.

The sub-county problem is worse still. Census tracts average about 4,000 residents, and most tracts receive zero BRFSS respondents in any given year. Direct estimation is impossible at that geographic level without a survey specifically designed for it—which would cost billions of dollars annually if conducted nationwide. The 500 Cities Project was launched precisely because local health departments needed tract-level data to target interventions but had no way to produce it from existing surveys.

The Methodology: Multilevel Regression and Poststratification

PLACES produces its estimates through a three-step process known as multilevel regression and poststratification, abbreviated MRP. The method was developed in political science for estimating state-level public opinion from national surveys, and has been adapted extensively for public health applications over the past two decades.

Step 1: Multilevel Logistic Regression

CDC analysts pool BRFSS data across multiple survey years and fit a multilevel logistic regression model predicting whether an individual respondent reports a given health behavior or condition. The predictors in the model are individual-level demographic characteristics available in BRFSS: age group, sex, race and ethnicity, educational attainment, and household income. The model also includes a state-level random effect, which captures systematic state-level differences in health outcomes not explained by the individual demographics—differences in state health policy, climate, industry composition, and other contextual factors. For some measures, the model incorporates additional geographic predictors such as metropolitan/non-metropolitan status.

The multilevel structure is critical. An ordinary logistic regression would treat every respondent as drawn from the same population, ignoring the fact that a respondent in Mississippi and a respondent in Colorado with identical demographic profiles still face very different background health environments. The random state effect captures that variation and allows the model to “borrow strength” across geographies: even states with small BRFSS samples contribute to a pooled estimate of state-level variation that informs all state estimates.

Step 2: Poststratification

The regression model produces estimated cell probabilities: the probability that a person with a specific age-sex-race-education-income profile reports a given health measure. Step 2 applies those cell probabilities to actual population counts from the Census Bureau's American Community Survey. The ACS provides five-year estimates of the number of adults in each county, census tract, and ZCTA who fall into each demographic cell (age × sex × race × education × income). Multiplying the estimated cell probability by the cell population count, then summing across all cells within a geography, produces a population-weighted prevalence estimate for that geography.

The poststratification step is what gives PLACES its ability to produce estimates for geographies with zero BRFSS respondents. Because the model has estimated probability for every demographic cell, and because the ACS provides population counts for every tract, PLACES can generate a tract-level estimate even for tracts that contributed no BRFSS data. The estimate is not a direct survey result but a model prediction informed by the demographic composition of the tract and the behavioral patterns of similar people elsewhere.

Step 3: Aggregation and Uncertainty

After producing tract-level estimates, PLACES aggregates up to county and ZCTA by population-weighting. Uncertainty in the final estimates comes from two sources: sampling uncertainty in the BRFSS data (which generates uncertainty in the regression coefficients) and the natural variability of demographic composition within cells. CDC propagates this uncertainty through a parametric bootstrapping procedure to produce 95 percent confidence intervals for each estimate.

The confidence intervals are important to understand. For large counties with populations above 100,000, PLACES estimates are typically quite precise— confidence intervals of plus or minus two to three percentage points. For small census tracts with fewer than 500 residents, the intervals can be very wide. Analysts should treat PLACES estimates for geographically small, demographically unusual tracts with appropriate caution, and should use the published confidence intervals rather than treating point estimates as ground truth.

The 36 Health Measures Across Five Domains

Health Outcomes

The health outcomes domain covers prevalent chronic conditions diagnosed by a clinician. Current PLACES measures include coronary heart disease, stroke, chronic obstructive pulmonary disease (COPD), cancer (all types excluding skin), chronic kidney disease (CKD), diagnosed diabetes, obesity (defined as BMI ≥ 30), arthritis, depression, and high blood pressure. Each of these is derived from BRFSS self-report questions asking respondents whether a doctor or health professional has ever told them they have the condition. Self-report introduces some measurement bias—underdiagnosis affects groups with lower healthcare access—but the BRFSS measures have been validated against clinical records and administrative claims and are generally considered reliable at the population level.

Prevention

Prevention measures capture whether adults are receiving recommended clinical preventive services. PLACES tracks colorectal cancer screening (colonoscopy or FOBT in the recommended time window), mammography among women 50–74, cervical cancer screening (Pap smear within three years for women 21–65), dental visit in the past year, health insurance coverage, routine annual checkup, and cholesterol screening in the past five years. These measures reflect access to and utilization of preventive care, making them sensitive to geographic variation in insurance coverage, provider availability, and health literacy.

Unhealthy Behaviors

The behavioral risk domain includes current smoking (adults who have smoked at least 100 cigarettes in their lifetime and currently smoke some days or every day), binge drinking (men consuming five or more drinks on one occasion in the past 30 days; women consuming four or more), physical inactivity (no leisure-time physical activity in the past month), and sleeping fewer than seven hours per night. These four measures are among the most powerful modifiable determinants of chronic disease, and their geographic variation across PLACES estimates is striking.

Disabilities

The disability domain reflects the six functional disability types used in the ACS: cognitive disability (serious difficulty concentrating, remembering, or making decisions), hearing disability (deafness or serious difficulty hearing), vision disability (blindness or serious difficulty seeing), mobility disability (serious difficulty walking or climbing stairs), self-care disability (difficulty bathing or dressing), and independent living disability (difficulty doing errands alone). PLACES estimates these measures using BRFSS disability questions and ACS disability prevalence data as poststratification anchors.

Social Determinants of Health

The most recently added domain captures three social determinants: housing insecurity (reported difficulty paying housing costs), food insecurity (reported lack of consistent access to enough food), and lack of reliable transportation. These measures bring PLACES closer to the upstream determinants that drive health outcomes, complementing the behavioral and clinical measures in the other four domains. They are available at the county and ZCTA level in current releases and are being extended to census tracts as model validation is completed.

Geographic Disparities: What the Data Shows

The most striking finding from PLACES data is the magnitude of geographic variation in health within the United States—variation that was largely invisible before sub-county estimates existed. Obesity prevalence above 40 percent is concentrated in Appalachian counties across West Virginia, eastern Kentucky, and eastern Tennessee, as well as in Mississippi Delta counties in the Deep South. In contrast, counties anchored by Mountain West university towns—Boulder County, Colorado; Teton County, Wyoming; Gallatin County, Montana—post obesity rates below 20 percent. That 20-percentage-point spread within a single country, among adults living under the same federal food and drug policies, reflects the profound effect of local economic conditions, built environment, food systems, and social norms on health behavior.

Diabetes prevalence shows an even starker geographic pattern. Counties in the Mississippi Delta—Humphreys, Sharkey, Leflore, and Holmes counties in Mississippi—post diabetes prevalence above 15 percent, compared to below 7 percent in Colorado counties along the Front Range. The Delta pattern reflects a convergence of high obesity rates, high rates of poverty and food insecurity, low rates of preventive care utilization, and an agricultural economy with limited access to fresh produce. The health geography of the Delta has been described as a “diabetes belt” by CDC researchers, and PLACES tract-level data reveals that the burden is not uniformly distributed even within Delta counties but is concentrated in the poorest census tracts.

Smoking prevalence illustrates a different geographic axis: the Appalachian tobacco-growing corridor. Eastern Kentucky counties—Leslie, Breathitt, Knott, Perry—post adult smoking rates above 25 percent in PLACES estimates, more than three times the rate in Santa Clara County, California (home to San Jose), where smoking falls below 8 percent. The gradient tracks historical tobacco cultivation, economic depression following coal industry decline, lower tobacco excise tax rates in tobacco-growing states, and cultural acceptance of smoking in rural communities.

The within-city variation that motivated the original 500 Cities Project is also prominent in PLACES data. The 500 Cities release first documented the health divide within Chicago, where census tracts on the predominantly white, higher-income North Side post obesity rates in the low 20s while tracts in the predominantly Black, lower-income South and West Sides post rates above 40 percent. Similar patterns appear in every large American city: health outcomes track income, race, and access to care with remarkable consistency at the neighborhood level, and PLACES provides the data to make those patterns visible and measurable.

Policy Applications

CDC's own programs use PLACES data for resource allocation. The National Center for HIV, Viral Hepatitis, STD, and TB Prevention (NCHHSTP) uses PLACES estimates of health insurance coverage and healthcare access to identify jurisdictions with high HIV risk and low prevention infrastructure, informing the allocation of prevention grants to state and local health departments. The Ending the HIV Epidemic initiative, which targets federal resources to 57 priority jurisdictions, used PLACES-derived health burden data in its original jurisdiction selection.

The Health Resources and Services Administration uses PLACES in planning for federally qualified health centers. FQHC service areas must demonstrate unmet need, and PLACES prevalence estimates for preventive care utilization—dental visits, cancer screenings, routine checkups—provide evidence of unmet need in ways that traditional socioeconomic measures alone cannot.

At the state level, health departments use PLACES as the foundation for Community Health Assessments and Community Health Improvement Plans (CHIP), both of which are required for hospital tax-exempt status under IRS rules and are tied to public health accreditation standards. A county health department building its CHIP must identify the highest-burden health conditions in its jurisdiction; PLACES provides the prevalence data that makes that identification systematic rather than anecdotal.

Academic researchers studying neighborhood health determinants use PLACES to construct health outcome variables at the tract level. Before PLACES, researchers who wanted to study the relationship between food environment and obesity, or between air quality and COPD, at the neighborhood level were limited to a handful of major cities with their own health survey programs. PLACES made that research nationally possible: any researcher can now link PLACES tract-level disease prevalence to environmental, social, economic, or policy variables available at the tract level from the ACS or other federal sources.

Health equity research has been a particularly prominent application. The PLACES data makes it straightforward to quantify the gap in health outcomes between high-income and low-income tracts, between majority-white and majority-minority tracts, between urban and rural geographies. That quantification is a prerequisite for policy arguments about addressing health disparities, and PLACES has become a standard citation in health equity reports, state health equity plans, and federal agency strategic documents.

PLACES vs. County Health Rankings

Analysts new to county health data frequently encounter both PLACES and the County Health Rankings (CHR) published annually by the Robert Wood Johnson Foundation in partnership with the University of Wisconsin Population Health Institute. The two products address related but distinct questions, and they are best understood as complements rather than substitutes.

PLACES produces prevalence estimates for specific health measures—what percentage of adults in this county smoke, have diabetes, received a mammogram last year. These are direct estimates of health behaviors and conditions, produced through a consistent MRP methodology applied to BRFSS data. PLACES does not produce a composite ranking; it produces measure-by-measure estimates that analysts can use in any way they choose.

County Health Rankings, by contrast, produces a composite health rank for each county within its state based on a weighted model that combines four categories of factors: health behaviors (30% weight, including smoking, obesity, physical inactivity, and alcohol use), clinical care (20%, including insurance rates, primary care provider ratios, and preventable hospital stays), social and economic factors (40%, including education, employment, income, and community safety), and physical environment (10%, including air and water quality). CHR uses multiple underlying data sources—BRFSS, Medicare claims, ACS, and others—rather than a single consistent methodology.

The practical differences for analysts: PLACES provides sub-county geography (census tracts and ZCTAs); CHR is county-only. PLACES uses a consistent methodology that makes year-over-year and cross-county comparisons of specific measures reliable; CHR's composite score methodology changes periodically, complicating trend analysis. PLACES is better for studying specific health conditions in geographic detail; CHR is better for understanding the overall health environment of a county as a composite of health, clinical care, and social factors. Researchers who need to understand why health outcomes differ across geographies should use both: PLACES for the outcome measures themselves, CHR for the social and economic context.

Data Access

PLACES data is published through CDC's open data portal at data.cdc.gov. The canonical access path for the Socrata API is the PLACES Local Data for Better Health datasets, organized by geography level: one dataset for counties, one for census tracts, one for ZCTAs. Each dataset has a Socrata dataset ID that can be used to construct API queries using the Socrata Query Language (SoQL). The county-level dataset ID as of recent releases is swc5-untb; the tract-level dataset is yjkw-uj5s; the ZCTA-level dataset isqnzd-25i4. Verify these IDs on the data.cdc.gov portal before use, as Socrata dataset IDs can change with major releases.

Each row in the PLACES Socrata datasets represents one measure for one geography in one release year. The key columns are locationname (county or tract identifier), locationid (FIPS code), stateabbr,category (the five domain names), measure (the specific health measure), data_value (the prevalence estimate), andlow_confidence_limit/high_confidence_limit (the 95 percent confidence interval bounds). The data_value_type field distinguishes crude prevalence from age-adjusted prevalence; most behavioral measures are available in both forms.

For bulk analysis, CDC also publishes CSV downloads of the full PLACES datasets on its open data portal. The county CSV is manageable (roughly 3,100 counties × 36 measures ≈ 110,000 rows); the tract CSV is substantially larger (29,000 tracts × 29 measures available at tract level ≈ 840,000 rows) and benefits from chunked reading with pandas. GeoJSON API access is available through the Socrata endpoint at data.cdc.gov/resource/[dataset-id].geojsonwith SoQL filters, enabling direct import into GIS tools. CDC also publishes an ArcGIS REST API through its CDC GeoServer at CDC's ArcGIS Online instance, which supports spatial queries by bounding box and state.

The Python sodapy library provides a typed client for the Socrata API and simplifies authentication, pagination, and SoQL construction. For most analyses, direct requests calls to the JSON API endpoint are equally convenient and avoid adding a dependency.

Python: Analyzing Mississippi County Health Burden

The following script downloads CDC PLACES county-level data for Mississippi via the Socrata API, extracts obesity, diabetes, and physical inactivity prevalences, computes the correlation matrix among the three measures, identifies the 10 counties with the highest composite chronic-disease burden, and produces a scatter plot of diabetes versus obesity prevalence with state average reference lines. Mississippi is used as the example because its counties span a wide range of burden levels and the Delta counties represent some of the highest chronic disease prevalences in the country.

import requests
import pandas as pd
import matplotlib.pyplot as plt

# ---------------------------------------------------------------
# CDC PLACES county-level data via Socrata API (data.cdc.gov)
# Dataset: PLACES Local Data for Better Health, County Data
# Socrata dataset ID: swc5-untb  (verify at data.cdc.gov if changed)
# ---------------------------------------------------------------

SOCRATA_BASE = "https://data.cdc.gov/resource/swc5-untb.json"

# Filter to Mississippi (state abbreviation = "MS")
# Request all rows for the state using SoQL $where and $limit
params = {
    "$where": "stateabbr='MS'",
    "$limit": 5000,
    "$order": "locationname ASC",
}

resp = requests.get(SOCRATA_BASE, params=params, timeout=60)
resp.raise_for_status()
raw = resp.json()

# Each row is one measure for one county in one release year.
# Key columns: locationname (county name), locationid (FIPS),
# category, measure, data_value (prevalence %), data_value_type,
# stateabbr, year, low_confidence_limit, high_confidence_limit
df = pd.DataFrame(raw)
print("Rows fetched:", len(df))
print("Columns:", list(df.columns))

# Convert prevalence to numeric
df["data_value"] = pd.to_numeric(df["data_value"], errors="coerce")

# ---------------------------------------------------------------
# Pivot to wide format: one row per county, measures as columns
# Focus on three correlated measures: obesity, diabetes, physical inactivity
# ---------------------------------------------------------------

MEASURES = {
    "OBESITY": "Obesity (crude prevalence %)",
    "DIABETES": "Diagnosed diabetes (crude prevalence %)",
    "LPA":      "Physical inactivity (crude prevalence %)",
}

# PLACES measure column names vary by release; filter by measure keyword
def extract_measure(df, keyword):
    subset = df[df["measure"].str.contains(keyword, case=False, na=False)].copy()
    # Drop duplicates keeping the most recent year per county
    subset = subset.sort_values("year", ascending=False).drop_duplicates("locationid")
    return subset[["locationid", "locationname", "data_value"]].rename(
        columns={"data_value": keyword}
    )

obesity   = extract_measure(df, "Obesity")
diabetes  = extract_measure(df, "Diabetes")
inactivity = extract_measure(df, "Physical inactivity")

# Merge all three on county FIPS
wide = obesity.merge(diabetes, on=["locationid", "locationname"])
wide = wide.merge(inactivity, on=["locationid", "locationname"])
wide = wide.dropna(subset=["OBESITY", "DIABETES", "LPA"])

print("\nCounties with all three measures: " + str(len(wide)))

# ---------------------------------------------------------------
# Correlation matrix between the three measures
# ---------------------------------------------------------------

corr = wide[["OBESITY", "DIABETES", "LPA"]].corr()
print("\nCorrelation matrix (Mississippi counties):")
print(corr.round(3).to_string())

# ---------------------------------------------------------------
# Identify the 10 counties with the highest combined burden
# Composite score = mean of the three standardized z-scores
# ---------------------------------------------------------------

for col in ["OBESITY", "DIABETES", "LPA"]:
    wide[col + "_z"] = (wide[col] - wide[col].mean()) / wide[col].std()

wide["composite_z"] = wide[["OBESITY_z", "DIABETES_z", "LPA_z"]].mean(axis=1)
top10 = wide.nlargest(10, "composite_z")[
    ["locationname", "OBESITY", "DIABETES", "LPA", "composite_z"]
].reset_index(drop=True)

print("\nTop 10 Mississippi counties by combined chronic-disease burden:")
print(top10.to_string(index=False))

# ---------------------------------------------------------------
# Scatter plot: diabetes vs. obesity, with state average reference lines
# ---------------------------------------------------------------

state_obesity_avg   = wide["OBESITY"].mean()
state_diabetes_avg  = wide["DIABETES"].mean()

fig, ax = plt.subplots(figsize=(9, 7))

ax.scatter(
    wide["OBESITY"],
    wide["DIABETES"],
    s=40,
    alpha=0.7,
    color="#0b4a8f",
    label="MS county",
)

# Highlight the 10 highest-burden counties
ax.scatter(
    top10["OBESITY"],
    top10["DIABETES"],
    s=80,
    color="#c0392b",
    zorder=5,
    label="Top 10 burden counties",
)

for _, row in top10.iterrows():
    ax.annotate(
        row["locationname"].replace(" County, Mississippi", ""),
        (row["OBESITY"], row["DIABETES"]),
        textcoords="offset points",
        xytext=(4, 4),
        fontsize=7,
    )

# State average reference lines
ax.axvline(state_obesity_avg, color="#888", linestyle="--", linewidth=0.9,
           label=f"State avg obesity ({state_obesity_avg:.1f}%)")
ax.axhline(state_diabetes_avg, color="#aaa", linestyle=":",  linewidth=0.9,
           label=f"State avg diabetes ({state_diabetes_avg:.1f}%)")

ax.set_xlabel("Adult obesity prevalence (%)")
ax.set_ylabel("Diagnosed diabetes prevalence (%)")
ax.set_title("CDC PLACES: Diabetes vs. Obesity by County — Mississippi")
ax.legend(fontsize=8)
fig.tight_layout()
plt.savefig("ms_places_scatter.png", dpi=150)
print("\nScatter plot saved to ms_places_scatter.png")

The correlation analysis typically reveals a strong positive correlation between obesity and diabetes (r ≈ 0.7–0.8 across Mississippi counties), a strong positive correlation between physical inactivity and obesity (r ≈ 0.6–0.7), and a moderate positive correlation between physical inactivity and diabetes (r ≈ 0.5–0.6). The composite burden ranking concentrates the highest-burden counties in the Delta region—Humphreys, Sharkey, Holmes, Leflore, and Sunflower counties consistently rank near the top of all three measures simultaneously. The scatter plot makes visible the upper-right cluster of Delta counties that sit well above both state averages for both conditions, separated from the remainder of the state by a clear gap.

To extend the analysis nationally, replace the stateabbr='MS' filter with no filter (removing the $where parameter entirely) and increase$limit to 200,000 to capture all counties. At the national scale, the geographic patterns described above—the obesity belt in Appalachia and the Delta, the diabetes concentration in the Southeast, the smoking gradient along the Appalachian corridor—become clearly visible and can be mapped directly by joining the PLACES FIPS codes to Census Bureau county shapefiles.

For the Census Bureau population estimates and demographic data that serve as the poststratification base for PLACES MRP models, see Census Population Estimates Program.

For USDA FSIS food safety recall and establishment data, including pathogen testing results by county that can be combined with PLACES food insecurity measures, see USDA FSIS Food Safety Data.