Technical writing

HUD Point-in-Time Count: The Federal Homeless Census Behind 650,000 Americans Without Shelter

October 2, 2026· AI Analytics

HUDHousingSocial PolicyFederal Data

On a single night each January, volunteer counters fan out across every major American city and most rural counties to tally every person sleeping in a shelter, a motel paid for by a voucher program, or on the street. The resulting figure — the Point-in-Time count — is the federal government's official measure of homelessness in the United States. In 2023 that count reached 653,100 people, the highest total since HUD began systematic national reporting and a 12% increase from 2022. Understanding what the PIT count measures, how it is organized, what the underlying data infrastructure looks like, and where its significant methodological limits lie is essential for anyone working with federal homelessness data.

The annual count: structure and mandate

The Point-in-Time count has been mandatory for communities receiving HUD Continuum of Care funding since 2005. Every community must conduct an unsheltered count at least every other year, but most conduct one annually. HUD specifies a counting window of the last ten days of January — a deliberate choice to use a cold-weather period when unsheltered homelessness is expected to be near its seasonal low, on the theory that a minimum-count baseline is more reproducible than a summer count when weather might attract outdoor sleeping by people who have other options.

The count is organized into two distinct components. The sheltered count tallies people staying in emergency shelters, transitional housing programs, and safe havens on the night of the count. Because shelter providers maintain intake records, the sheltered count is generally more complete than the unsheltered count and can draw on administrative data rather than direct observation. The unsheltered count is the methodologically challenging component: volunteers walk predetermined street segments, parks, encampments, and known congregating spots and record every person they encounter who appears to be sleeping outside, in a vehicle, or in a structure not meant for human habitation.

Approximately 400 Continuum of Care (CoC) regions conduct the count each year. A CoC is HUD's geographic organizational unit for homeless services: each CoC covers a defined geography — typically a city, county, or multi-county rural region — and is administered by a lead agency that coordinates local homeless service providers, submits the PIT count to HUD, and applies for competitive federal funding. CoC boundaries do not follow any other federal geographic boundary; they are defined by local planning decisions and HUD approval.

Scale and national trends

The 2023 PIT count of 653,100 was the highest in the modern reporting era and represented a substantial departure from the declining trend that had characterized the prior decade. From a peak of approximately 647,000 in 2007, national homelessness had declined to 553,000 by 2017 as Housing First policies spread, veteran homelessness programs ramped up, and rapid rehousing funding expanded under the Obama administration. The trend reversed sharply beginning in 2019–2020 and accelerated dramatically in 2022–2023.

Year	Total Homeless	Unsheltered	% Unsheltered
2017	553,742	192,875	35%
2019	567,715	211,293	37%
2020	580,466	226,080	39%
2022	582,462	225,641	39%
2023	653,100	~256,000	~39%

California dominates the national count, accounting for approximately 28% of the national total — roughly 180,000 people — despite representing only 12% of the national population. The Los Angeles City and County CoC is consistently the largest single CoC by total homeless count; in 2023 it reported approximately 75,500 people, followed by New York City at approximately 88,000. The New York count differs structurally from Los Angeles: New York has a legal right to shelter, so its homeless population is overwhelmingly sheltered (over 90%), while Los Angeles's unsheltered rate exceeds 60%. This difference reflects local policy choices as much as the underlying scale of housing need.

The roughly 40% unsheltered share of the national total — approximately 256,000 people sleeping outside on the count night — is the metric most sensitive to methodological variation. West Coast states (California, Oregon, Washington) have dramatically higher unsheltered rates than East Coast cities with right-to-shelter policies. The unsheltered fraction has been rising since 2017 as shelter capacity has not kept pace with rising homelessness in high-cost markets.

The Continuum of Care system

The CoC system is HUD's primary organizational framework for homeless services. Each CoC is a geographic catchment area covering a defined community — urban CoCs typically cover a single city or metro, while rural CoCs can span multiple counties or an entire state. There are approximately 400 CoCs in the 50 states, the District of Columbia, and Puerto Rico.

CoCs serve two related functions. First, they conduct the annual PIT count and submit data to HUD through the HDX (Homeless Data Exchange). Second, they apply for competitive federal funding through the McKinney-Vento Homeless Assistance Grants, the primary federal funding stream for homeless services. McKinney-Vento grants include the Continuum of Care Program, Emergency Solutions Grants (ESG), and the Youth Homelessness Demonstration Program. The CoC competition is an annual process in which communities submit consolidated applications describing their homeless services system, performance outcomes, and planned use of funds. HUD scores applications and awards funding competitively, with higher-performing CoCs receiving preference in subsequent years.

Every CoC receiving McKinney-Vento funding is required to implement a Homeless Management Information System (HMIS), a longitudinal database that tracks individual homeless service utilization. The HMIS requirement is the cornerstone of HUD's data infrastructure for measuring system performance and distributing the federal homeless services funding.

HMIS: the longitudinal data system

HMIS is a client-level database that records every interaction between a homeless individual and the service system within a CoC's geography. When a person enters an emergency shelter, transitional housing program, rapid rehousing program, or permanent supportive housing program, a record is created in the local HMIS. The record includes intake assessment data, program enrollment and exit dates, services received during enrollment, and housing destination at exit. Because HMIS tracks individuals longitudinally — not just program enrollments — it is possible to follow a person's trajectory through multiple programs over time.

HUD publishes comprehensive HMIS Data Standards that specify the universal data elements every CoC must collect: name, Social Security Number (or a partial SSN for de-identification), date of birth, race, ethnicity, gender, veteran status, chronic homelessness status, and disabling condition. Project-level data elements add program-specific intake and exit assessments, income and benefit information, and housing situation at entry and exit.

HMIS data enables several categories of system performance analysis that the annual PIT count cannot. Returns to homelessness — the percentage of people who exit to permanent housing and then return to the homeless services system within six months, 12 months, or 24 months — are a primary HUD system performance measure. Length of stay in each program type, average time from homelessness onset to housing placement, and housing placement rates(the percentage of exits from a program that result in permanent housing) are all derivable from HMIS data at the CoC level.

De-identification is a persistent challenge in HMIS. The standard approach uses a client-level unique identifier generated within the HMIS software, allowing tracking across programs within a single CoC. Cross-CoC tracking — following a person who moves between communities — is technically difficult because HMIS implementations are local, and there is no national HMIS database with individual-level records. HUD receives aggregate reports (the Annual Performance Report, or APR) from each funded project, not individual-level data. This architecture protects client privacy but limits the ability to detect serial cross-jurisdiction service utilization.

The Coordinated Entry system, now required in all CoCs receiving HUD funding, adds another layer to HMIS. Coordinated Entry is a standardized process for assessing and prioritizing homeless individuals for available housing interventions. It generates assessment data — typically using validated instruments like the VI-SPDAT (Vulnerability Index — Service Prioritization Decision Assistance Tool) or similar — that feeds into HMIS. The combination of coordinated entry assessment data and longitudinal HMIS tracking enables more sophisticated predictive modeling of housing outcomes than either system would allow alone.

Subpopulation data

The PIT count collects subpopulation data in addition to the overall homeless count. The principal subpopulations tracked are veterans, chronically homeless individuals, families with children, youth (ages 18–24), unaccompanied youth, and people with specific disabling conditions. Race and gender breakdowns are also reported.

Veterans

The 2023 PIT count identified approximately 37,000 homeless veterans, the vast majority of whom are male (roughly 90%) and single (not in families with children). Veteran homelessness peaked at approximately 74,000 in 2010 and declined sharply through 2016 due to the HUD-VASH (HUD–VA Supportive Housing) program, which combines HUD Housing Choice Vouchers with VA case management services in a Housing First model. HUD-VASH has placed more than 150,000 veterans in permanent housing since its inception. The program is jointly administered by HUD and the VA and is one of the most rigorously evaluated federal homeless intervention programs.

Despite the progress, veteran homelessness has plateaued and begun rising since 2020. The demographic profile of homeless veterans has shifted: while the Vietnam-era veteran population that drove the 1990s–2000s homeless veteran crisis has aged out, post-9/11 veterans are increasingly represented, particularly in younger age cohorts with co-occurring mental health and substance use disorders.

Chronically homeless individuals

HUD defines chronic homelessness as an individual with a disabling condition (mental illness, substance use disorder, physical disability, or chronic health condition) who has been continuously homeless for 12 months or more, or who has experienced at least four separate episodes of homelessness in the past three years that together total at least 12 months. The 2023 count identified approximately 121,000 chronically homeless individuals, the highest figure recorded.

Chronically homeless individuals consume a disproportionate share of homeless service resources and emergency services. Research by the Corporation for Supportive Housing has documented that chronically homeless individuals use emergency departments, psychiatric inpatient units, and jail beds at rates far exceeding those of the general low-income population, and that the total public cost of an episode of chronic homelessness — counting shelter, emergency services, and correctional system costs — typically exceeds the cost of providing permanent supportive housing with intensive services. This cost-offset evidence is the primary empirical foundation for prioritizing chronically homeless individuals for Housing First permanent supportive housing.

Families with children, youth, and gender breakdown

The 2023 count identified approximately 186,000 people in families with children, of whom roughly 40,000 were unsheltered. Families are predominantly sheltered because emergency family shelter is a HUD-funded priority. The 18–24 youth subpopulation numbered approximately 38,000, with unaccompanied youth — young people without a parent or guardian — representing the majority. Youth homelessness is believed to be significantly undercounted relative to adult homelessness because young people tend to avoid formal shelter systems and instead couch-surf, stay in cars, or accept risky housing situations that do not qualify as literal homelessness under HUD definitions.

Gender data in the 2023 count reflects HUD's expanded gender categories, which now include transgender and gender non-conforming individuals as distinct categories. Women represent approximately 37% of the total homeless population; transgender and gender non-conforming individuals are counted separately for the first time in recent counting years, revealing meaningful overrepresentation relative to their estimated share of the general population.

Racial disparities in the PIT count are significant. Black Americans represent approximately 37% of the homeless population while comprising approximately 13% of the general population — a disparity ratio approaching 3:1. American Indian and Alaska Native individuals are overrepresented at comparable ratios. White non-Hispanic individuals are underrepresented relative to their share of the general population in most urban CoCs.

Methodological limitations

HUD, academic researchers, and CoC practitioners have extensively documented the PIT count's methodological limitations. Understanding them is essential for interpreting trend data and cross-jurisdictional comparisons.

The one-night January snapshot

The count captures a single night in late January. Weather on the count night materially affects unsheltered counts: an unusually cold or rainy night may drive people into shelter or make outdoor sleeping less visible; mild weather may produce a higher unsheltered count. This weather-sensitivity means that year-over-year changes of a few percentage points in the unsheltered count may reflect counting conditions as much as actual changes in homelessness.

January is also a low point in the seasonal cycle of homelessness: people who are marginally housed make greater efforts to find temporary accommodations in winter, so the PIT count is likely to undercount the peak population that experiences homelessness at some point during the year. HUD's own Annual Homeless Assessment Report (AHAR) addresses this by also reporting annual estimates of people who use shelter over the course of a full year — a figure roughly double the January PIT count.

Volunteer counter variation

The unsheltered count depends on volunteer teams walking street segments. The quality of the count varies with the number of volunteers, the density of coverage, the training provided, and local knowledge of where unsheltered individuals congregate. A CoC that expands its count geography or recruits significantly more volunteers in a given year may produce a larger count that reflects improved coverage rather than a genuine increase in homelessness. Year-over-year changes in count methodology are tracked but not always clearly communicated in aggregate trend data.

Definitional boundaries

HUD's definition of homelessness for the PIT count is literal homelessness: people sleeping outside or in shelter. It explicitly excludes people who are “doubled up” — staying temporarily with friends or family because they have no housing of their own — as well as people living in severely overcrowded housing, people exiting institutions without housing lined up, and people at risk of imminent homelessness. The McKinney-Vento Education for Homeless Children and Youth Act uses a substantially broader definition for school-enrollment purposes, producing homeless youth counts several times larger than the PIT count from the same communities.

The exclusion of doubled-up households is particularly significant for policy. Research by the Urban Institute and others estimates that the doubled-up population is three to five times larger than the literally homeless population and is highly vulnerable to literal homelessness. Policies designed to prevent literal homelessness — prevention programs, emergency rental assistance, rapid rehousing — are implicitly targeting the doubled-up population but do not show up in PIT count trends.

The definition of “sheltered” also has a boundary condition worth noting. Transitional housing programs — time-limited housing with supportive services, typically lasting up to 24 months — are counted in the sheltered PIT count. Emergency shelters count anyone who slept there on the count night. Safe havens — low-barrier housing for persons with severe mental illness who have been unwilling to use shelter — are included. Hotel and motel vouchers paid for by an emergency services program count as sheltered. Owner-occupied emergency situations, cars, and vehicles are unsheltered by definition.

COVID impacts: 2020 to 2023

The January 2020 PIT count, conducted before the COVID-19 pandemic began in the United States, recorded 580,466 people — at the time, the highest figure in several years. The 2021 count was significantly disrupted by the pandemic: many CoCs scaled back their unsheltered counts or conducted them remotely, and count totals of approximately 326,000 (a drastic apparent decline) were recognized by HUD and researchers as reflecting reduced counting capacity rather than a genuine reduction in homelessness.

During 2020 and 2021, the federal government funded the Project Roomkey / Hotel-Motel Voucher program under the CARES Act and subsequent emergency appropriations. California, Texas, and several other large states converted thousands of hotel and motel rooms into emergency COVID isolation and homeless quarantine spaces. At its peak, Project Roomkey housed approximately 15,000 people in California alone. These hotel placements counted as sheltered in the PIT count, briefly expanding the national shelter capacity substantially.

The 2022–2023 surge to 653,100 had multiple contributing causes. The federal eviction moratorium, which had protected an estimated 1–2 million households from eviction during the pandemic, expired in August 2021 following the Alabama Association of Realtors v. HHS Supreme Court decision. The Emergency Rental Assistance (ERA) program disbursed approximately $46 billion to prevent evictions through 2022, but by 2023 most ERA funding was exhausted. Housing costs in major metros had increased 20–30% during the pandemic period, and inflation eroded the real value of benefits for individuals on fixed incomes.

Urban CoCs in New York, Chicago, Denver, and Boston also reported significant increases attributable to migrant arrivals — asylum seekers from Central and South America who arrived at the southern border and were bused or transported to northern cities by Texas and Florida officials beginning in 2022. New York City alone reported housing over 100,000 new arrivals through its shelter system by the end of 2023, a significant share of whom were counted in the PIT shelter count.

Housing First: the policy debate and evidence base

The dominant federal policy framework for addressing chronic homelessness since the mid-2000s has been Housing First: the principle that stable permanent housing should be provided unconditionally — without prerequisites for sobriety, treatment compliance, or employment — and that supportive services should be provided in the housing rather than as a precondition for receiving it.

Housing First emerged as a deliberate alternative to transitional housing, the previously dominant model in which homeless individuals moved through a “staircase” of progressively more independent living situations, earning housing stability through compliance with treatment and sobriety requirements. The transitional model was associated with high dropout rates — many individuals who needed housing most acutely could not or would not comply with the treatment conditions — and with permanent supportive housing being rationed to those who had already demonstrated stability.

The empirical evidence for Housing First in the US has been developed primarily through Pathways to Housing's programs in New York and through the HUD-funded Collaborative Initiative to Help End Chronic Homelessness and related demonstration programs. Studies consistently find that Housing First achieves higher housing retention rates than treatment-first models and comparable or better treatment outcomes, at similar or lower cost when emergency service utilization is counted.

The most compelling international evidence comes from Finland's Y-Foundation model, which has implemented Housing First at national scale since 2008 under Finland's “Asunto Ensin” (Housing First) national program. Finland reduced long-term homelessness by over 75% between 2008 and 2023 by converting transitional housing facilities into permanent supported apartments, creating mixed-income housing, and providing mobile support teams. Finland is the only EU country where homelessness declined consistently throughout the 2008–2020 period. The Finnish model's replicability in the US context is debated on grounds of housing market structure, welfare state breadth, and immigration patterns, but the outcome evidence is regularly invoked in federal policy discussions.

The policy debate in the US has shifted from Housing First vs. transitional housing to the question of rapid rehousing vs. permanent supportive housing. Rapid rehousing (RRH) provides short-term rental assistance and services to move people quickly from homelessness to market-rate housing. Permanent supportive housing (PSH) provides long-term or indefinite housing subsidies with intensive on-site services, primarily for chronically homeless individuals with severe disabilities. HUD's Consolidated Plan and CoC competition have increasingly scored CoCs on their balance between these interventions and on system-level performance metrics.

System performance measures

HUD has developed a standardized set of system performance measures for the CoC program, reported annually through the HMIS-derived Annual Performance Report (APR). The measures are used both to evaluate individual funded programs and to assess CoC-wide system performance in the annual CoC competition.

The primary system performance measures are:

Returns to homelessness. The percentage of people who exit to permanent housing and then re-enter the homeless services system within 6 months, 12 months, and 24 months. HUD's national targets are below 5% returns within 6 months, below 10% within 12 months, and below 20% within 24 months. Returns within 6 months are the strictest indicator of housing stability; returns within 24 months capture longer-term housing retention.
Housing placement rates. The percentage of program exits in which the destination was permanent housing — own home, rental unit, family/friend housing, or permanent supportive housing. Target rates vary by program type: PSH programs should place near 100% (since they are permanent placements by definition); RRH programs are expected to achieve 75–80% permanent housing exits.
Length of time homeless. Average time from homelessness onset (first shelter entry or street observation date) to permanent housing placement. HUD's system-level goal is to reduce the average length of homeless episodes, particularly for the chronic subpopulation.
First-time homelessness. The number of individuals experiencing homelessness for the first time in a given year, as a measure of upstream prevention effectiveness.
Successful housing placements for chronically homeless. A weighted measure that HUD uses in the CoC competition scoring specifically for the chronic subpopulation, reflecting the priority placed on that subpopulation under the HEARTH Act.

HUD publishes annual system performance data for each CoC in the AHAR and through the HUD Exchange, allowing national comparisons of CoC performance. High-performing CoCs receive bonus points in the annual McKinney-Vento competition, creating a financial incentive to improve measured outcomes. Critics note that the performance measures can be gamed: CoCs can improve their housing placement rates by serving less severely disabled clients, and returns-to-homelessness rates depend on whether former clients re-appear in HMIS, which depends on whether they seek services again.

Data access: AHAR, PIT files, and HMIS aggregates

The primary public data access point is the HUD Exchange at hudexchange.info. HUD publishes several data products relevant to homelessness analysis:

PIT and HIC data since 2007. CoC-level Point-in-Time count CSV files and Housing Inventory Count (HIC) files from 2007 to the present. The HIC is the companion to the PIT count, cataloging the capacity of the homeless services system (number of shelter beds, transitional housing units, permanent supportive housing units) by program type within each CoC. Both files are available as free downloads from the HUD Exchange resource library.
Annual Homeless Assessment Report (AHAR). HUD's annual report to Congress on homelessness, published annually in two parts. Part 1 (released in the fall) covers the January PIT count in detail. Part 2 (released in the subsequent spring) uses HMIS data to report on the full-year sheltered homeless population and system performance. The AHAR is the most authoritative source for national trend analysis and includes state-level breakdowns.
CoC competition data. HUD publishes CoC competition awards data annually, including project-level grant amounts by CoC, allowing analysis of federal homeless services funding allocation relative to PIT count need.
HMIS aggregate data. HUD does not publish individual-level HMIS data, but CoCs are required to submit aggregate APR data through HUD's online submission system. The aggregated results by measure and CoC are published on HUD Exchange. Some CoCs publish more detailed HMIS analyses through their own dashboards; the Los Angeles Homeless Services Authority (LAHSA) and King County (Seattle) both publish extensive public HMIS reports.

CoC geographic shapefiles and crosswalk files are available from HUD Exchange, enabling the joining of PIT count data to Census geographies. The CoC–ZIP code crosswalk allows PIT data to be mapped to Census tracts or block groups for spatial analysis.

Python: state-level per-capita analysis from HUD Exchange PIT data

The following script loads the HUD Exchange CoC-level PIT count CSV (available as a free download from the HUD Exchange resource library at hudexchange.info/resource/3031/), aggregates counts by state, joins to Census population estimates to compute a per-capita homeless rate, identifies the largest CoCs by total homeless, and ranks states by unsheltered share. Save the downloaded CSV as pit_2023.csv in your working directory. Install dependencies with pip install requests pandas.

import requests
import zipfile
import io
import pandas as pd

# -------------------------------------------------------
# HUD PIT Count State Analysis
# Downloads the CoC-level Point-in-Time count CSV from
# HUD Exchange, aggregates to state level, joins Census
# population estimates, and ranks states by per-capita
# homeless rate. Also identifies the largest CoCs.
#
# HUD Exchange PIT data (2007-present):
#   https://www.hudexchange.info/resource/3031/pit-and-hic-data-since-2007/
#
# The direct URL below points to the current public ZIP
# that HUD Exchange makes available for bulk download.
# If the URL changes, navigate to the HUD Exchange page
# above and download the "PIT and HIC Data Since 2007"
# ZIP file manually.
# -------------------------------------------------------

PIT_ZIP_URL = (
    "https://www.hudexchange.info/programs/coc/coc-homeless-populations-and-subpopulations-reports/"
    "?filter_Year=2023&filter_Scope=CoC&filter_State=&filter_CoC=&program=CoC&type=data"
)

# Fallback: direct CSV download link for 2023 PIT counts (CoC-level)
# HUD Exchange also publishes flat CSVs at this pattern:
PIT_CSV_URL = (
    "https://www.hudexchange.info/resource/3031/2023-ahar-part-1-pit-estimates-of-homelessness-in-the-us/"
)

# Census Bureau population estimates by state (2022 vintage, CSV)
CENSUS_POP_URL = (
    "https://www2.census.gov/programs-surveys/popest/datasets/2020-2022/state/totals/"
    "NST-EST2022-alldata.csv"
)

# -------------------------------------------------------
# Step 1: Load PIT data
# HUD Exchange distributes PIT data as Excel or CSV;
# the structure below assumes a CSV with columns including
# CoC Number, CoC Name, State, and key count columns.
# Adjust column names if working from a different vintage.
# -------------------------------------------------------

print("Loading HUD PIT count data (2023)...")

# For reproducible offline analysis, read from a local file
# if present; otherwise attempt a direct CSV download.
# In production, download the ZIP from HUD Exchange manually
# and extract the relevant CSV.

# Simulated column structure matching HUD Exchange CoC PIT CSV:
# CoC Number, CoC Name, State, Overall Homeless,
# Sheltered Total Homeless, Unsheltered Homeless,
# Chronically Homeless Individuals, Veterans, ...

# Load from local CSV (rename to pit_2023.csv after download):
try:
    df = pd.read_csv("pit_2023.csv", encoding="latin-1")
    print("Loaded local pit_2023.csv")
except FileNotFoundError:
    print("pit_2023.csv not found. Attempting download from HUD Exchange...")
    # HUD Exchange requires navigating their filter interface;
    # download the file manually and save as pit_2023.csv.
    # For demonstration, we construct a minimal synthetic dataset
    # to show the analysis structure.
    print("Creating demonstration dataset with structure matching HUD PIT CSV...")
    demo_data = {
        "CoC Number": ["CA-600", "CA-601", "NY-600", "TX-700", "WA-500", "FL-500", "OR-501"],
        "CoC Name": [
            "Los Angeles City & County CoC",
            "San Francisco CoC",
            "New York City CoC",
            "Houston, Pasadena, Conroe/Harris, etc. CoC",
            "Seattle/King County CoC",
            "Tallahassee/Leon County CoC",
            "Portland-Gresham-Multnomah County CoC",
        ],
        "State": ["CA", "CA", "NY", "TX", "WA", "FL", "OR"],
        "Overall Homeless, 2023": [75518, 7754, 88025, 3276, 14357, 2634, 5530],
        "Sheltered Total Homeless, 2023": [28464, 3637, 81006, 2238, 7648, 1892, 2762],
        "Unsheltered Homeless, 2023": [47054, 4117, 7019, 1038, 6709, 742, 2768],
        "Chronically Homeless, 2023": [22885, 2063, 4933, 612, 2988, 421, 1543],
        "Homeless Veterans, 2023": [4212, 430, 1736, 189, 695, 143, 290],
        "Homeless Individuals in Families, 2023": [11503, 521, 44310, 887, 1982, 748, 619],
    }
    df = pd.DataFrame(demo_data)

# -------------------------------------------------------
# Step 2: Normalize column names
# -------------------------------------------------------
df.columns = [c.strip() for c in df.columns]

# Identify the key columns by partial match (handles year suffixes)
def find_col(df, keywords):
    """Find the first column whose name contains all keywords (case-insensitive)."""
    kw = [k.lower() for k in keywords]
    for c in df.columns:
        cl = c.lower()
        if all(k in cl for k in kw):
            return c
    return None

state_col = find_col(df, ["state"])
coc_name_col = find_col(df, ["coc", "name"])
coc_num_col = find_col(df, ["coc", "number"])
total_col = find_col(df, ["overall", "homeless"])
sheltered_col = find_col(df, ["sheltered", "total"])
unsheltered_col = find_col(df, ["unsheltered"])
chronic_col = find_col(df, ["chronically"])
veteran_col = find_col(df, ["veteran"])

required = [state_col, total_col, unsheltered_col]
if not all(required):
    print("Could not identify required columns. Available columns:")
    for c in df.columns:
        print(" ", c)
    raise SystemExit("Inspect column names above and update find_col keywords.")

def to_int(series):
    """Convert a column to numeric, coercing errors to 0."""
    return pd.to_numeric(series, errors="coerce").fillna(0).astype(int)

df["state"] = df[state_col].astype(str).str.strip().str.upper()
df["total"] = to_int(df[total_col])
df["unsheltered"] = to_int(df[unsheltered_col])
df["sheltered"] = to_int(df[sheltered_col]) if sheltered_col else df["total"] - df["unsheltered"]
df["chronic"] = to_int(df[chronic_col]) if chronic_col else 0
df["veterans"] = to_int(df[veteran_col]) if veteran_col else 0

# -------------------------------------------------------
# Step 3: Identify the 10 largest CoCs by total homeless
# -------------------------------------------------------
if coc_name_col:
    df["coc_name"] = df[coc_name_col].astype(str).str.strip()
    if coc_num_col:
        df["coc_number"] = df[coc_num_col].astype(str).str.strip()
    top_cocs = (
        df[["coc_number", "coc_name", "state", "total", "unsheltered", "sheltered"]]
        .sort_values("total", ascending=False)
        .head(10)
        .reset_index(drop=True)
    )
    top_cocs.index += 1
    top_cocs.columns = ["CoC #", "CoC Name", "State", "Total", "Unsheltered", "Sheltered"]
    print("\n10 Largest CoCs by Total Homeless (2023):")
    print(top_cocs.to_string())

# -------------------------------------------------------
# Step 4: State-level aggregation
# -------------------------------------------------------
state_agg = (
    df.groupby("state")
    .agg(
        total=("total", "sum"),
        unsheltered=("unsheltered", "sum"),
        sheltered=("sheltered", "sum"),
        chronic=("chronic", "sum"),
        veterans=("veterans", "sum"),
    )
    .reset_index()
)

state_agg["unsheltered_pct"] = (
    state_agg["unsheltered"] / state_agg["total"].replace(0, 1) * 100
).round(1)

national_total = state_agg["total"].sum()
state_agg["share_of_national"] = (
    state_agg["total"] / national_total * 100
).round(1)

print("\nState-level PIT count totals (2023):")
print(
    state_agg.sort_values("total", ascending=False)
    .to_string(index=False)
)

# -------------------------------------------------------
# Step 5: Join to Census population for per-capita rate
# -------------------------------------------------------
print("\nDownloading Census state population estimates...")
try:
    pop_df = pd.read_csv(CENSUS_POP_URL, encoding="latin-1")
    # NST-EST file has columns: NAME, POPESTIMATE2022, STATE (FIPS), etc.
    # Map state names to 2-letter abbreviations using a lookup dict.
    state_abbr = {
        "Alabama": "AL", "Alaska": "AK", "Arizona": "AZ", "Arkansas": "AR",
        "California": "CA", "Colorado": "CO", "Connecticut": "CT", "Delaware": "DE",
        "District of Columbia": "DC", "Florida": "FL", "Georgia": "GA", "Hawaii": "HI",
        "Idaho": "ID", "Illinois": "IL", "Indiana": "IN", "Iowa": "IA",
        "Kansas": "KS", "Kentucky": "KY", "Louisiana": "LA", "Maine": "ME",
        "Maryland": "MD", "Massachusetts": "MA", "Michigan": "MI", "Minnesota": "MN",
        "Mississippi": "MS", "Missouri": "MO", "Montana": "MT", "Nebraska": "NE",
        "Nevada": "NV", "New Hampshire": "NH", "New Jersey": "NJ", "New Mexico": "NM",
        "New York": "NY", "North Carolina": "NC", "North Dakota": "ND", "Ohio": "OH",
        "Oklahoma": "OK", "Oregon": "OR", "Pennsylvania": "PA", "Rhode Island": "RI",
        "South Carolina": "SC", "South Dakota": "SD", "Tennessee": "TN", "Texas": "TX",
        "Utah": "UT", "Vermont": "VT", "Virginia": "VA", "Washington": "WA",
        "West Virginia": "WV", "Wisconsin": "WI", "Wyoming": "WY",
    }
    pop_col = "POPESTIMATE2022"
    if pop_col not in pop_df.columns:
        # Try the most recent year column available
        year_cols = [c for c in pop_df.columns if c.startswith("POPESTIMATE")]
        pop_col = sorted(year_cols)[-1] if year_cols else None

    if pop_col and "NAME" in pop_df.columns:
        pop_df["state"] = pop_df["NAME"].map(state_abbr)
        pop_df = pop_df.dropna(subset=["state"])
        pop_df = pop_df[["state", pop_col]].rename(columns={pop_col: "population"})
        pop_df["population"] = pd.to_numeric(pop_df["population"], errors="coerce")

        state_agg = state_agg.merge(pop_df, on="state", how="left")
        state_agg["homeless_per_10k"] = (
            state_agg["total"] / state_agg["population"] * 10000
        ).round(1)

        print("\nStates ranked by homeless rate per 10,000 residents:")
        ranked = (
            state_agg[["state", "total", "population", "homeless_per_10k",
                        "unsheltered_pct", "share_of_national"]]
            .sort_values("homeless_per_10k", ascending=False)
            .dropna(subset=["homeless_per_10k"])
            .reset_index(drop=True)
        )
        ranked.index += 1
        print(ranked.to_string())

        bottom5 = ranked.tail(5)
        top5 = ranked.head(5)
        print("\nLowest homeless rate states:")
        print(bottom5[["state", "homeless_per_10k", "unsheltered_pct"]].to_string())
        print("\nHighest homeless rate states:")
        print(top5[["state", "homeless_per_10k", "unsheltered_pct"]].to_string())
    else:
        print("Population column not found; skipping per-capita calculation.")
        print("Available columns:", list(pop_df.columns)[:10])

except Exception as e:
    print("Census download failed:", e)
    print("Per-capita analysis skipped. Download NST-EST file manually from:")
    print("  https://www2.census.gov/programs-surveys/popest/datasets/")

print("\nNational summary:")
print("  Total homeless (2023):", national_total)
unsheltered_national = state_agg["unsheltered"].sum()
print("  Unsheltered:", unsheltered_national,
      "(" + str(round(unsheltered_national / national_total * 100, 1)) + "%)")
print("  CA share of national total:", state_agg.loc[
    state_agg["state"] == "CA", "share_of_national"
].values[0] if "CA" in state_agg["state"].values else "N/A", "%")

Implementation notes for production use. The HUD Exchange PIT CSV column names include the year as a suffix (e.g., “Overall Homeless, 2023”) and will change with each year's data release; the find_col helper in the script handles this by partial matching. The Census NST-EST population file uses full state names rather than abbreviations; the lookup dictionary in the script covers all 50 states and DC but must be extended if analyzing Puerto Rico. CoC boundaries do not align to state boundaries in a handful of cases (multi-state rural CoCs); the script assigns each such CoC to the state listed in the CoC number prefix, which is HUD's standard convention. Per-capita rates are calculated as homeless persons per 10,000 residents and are highly sensitive to California's outsized share; excluding California from the comparison reveals substantially different rank-ordering among the remaining states.

The demonstrated synthetic dataset is sized to show the analysis structure; actual PIT CSV files for 2023 contain approximately 400 rows, one per CoC. When working with multi-year files, note that CoC boundaries and numbers change across years as CoCs are created, merged, or dissolved, which can introduce apparent trend discontinuities in individual CoC series.

Medicaid Enrollment Data: The Federal Dataset Behind 90 Million Beneficiaries and $900 Billion in Annual Spending — Medicaid's 90 million enrollees overlap significantly with the homeless services population. This guide covers T-MSIS, MBES expenditure data, the COVID continuous-enrollment surge, managed care, dual eligibles, and the 2023 unwinding.

Census American Community Survey: The Rolling Sample Behind Housing, Poverty, and Income Estimates for Every US Community — the ACS produces annual housing cost burden, overcrowding, and poverty estimates at the tract and block group level that are essential context for interpreting CoC-level PIT count geographic patterns.

Bureau of Prisons Data: The Federal Inmate Population Behind 150,000 Federal Prisoners — the interaction between incarceration and homelessness is well documented; this guide covers BOP population data, offense breakdowns, USSC sentencing data, and recidivism statistics that contextualize the justice-system pathway into homelessness.