Technical writing

Census American Housing Survey: The Biennial Housing Quality Database Behind US Structural Conditions and Neighborhood Characteristics

· AI Analytics
CensusAHSHousingFederal Data

The American Housing Survey is the federal government's most detailed longitudinal record of housing quality in the United States. Run jointly by the Census Bureau and the Department of Housing and Urban Development on a biennial cycle, it returns to the same approximately 60,000 housing units every two years — tracking whether roofs leak, furnaces fail, rodents appear, and rents crowd out income — producing a continuous panel that stretches back to 1973. No other federal program attempts to follow the same physical structures over decades.

What the AHS Is and How It Differs from the ACS

The American Housing Survey is not a demographic survey that happens to include housing questions. It is a housing unit panel: the Census Bureau identifies a sample of addresses and returns to those exact addresses every wave, whether the original occupant is still there or not. New occupants answer the survey on behalf of the same structure. This design means analysts can track the physical condition and occupancy history of specific units across time — something that cross-sectional surveys cannot support.

The AHS is fielded in odd-numbered years. The national sample covers roughly 60,000 housing units and is designed to produce reliable national and regional estimates. In addition to the national sample, the Census Bureau conducts metropolitan-area oversamples for approximately 25 large metropolitan statistical areas on a four-year rotation, providing the statistical precision needed for MSA-level analysis of housing conditions, cost burden, and neighborhood quality.

The American Community Survey, by contrast, is a cross-sectional population survey. Each year a new sample of addresses receives the ACS questionnaire; those addresses are not revisited in future waves. The ACS covers housing characteristics — tenure, structure type, year built, housing costs, plumbing and kitchen facilities — but its housing content is far shallower than the AHS. The ACS asks whether a unit has complete plumbing; the AHS asks about roof leaks, water intrusion from inside the structure, holes in floors, broken plaster, electrical problems, rodent sightings, and heating equipment failures in the past 12 months. The ACS produces small-area demographic context at the census tract level; the AHS produces deep housing condition data for a longitudinal panel. The two surveys are complements, not substitutes.

Survey Content: Structural Characteristics

The AHS collects an unusually granular set of structural variables for each sampled unit. Year built is recorded in categories that allow pre-war, postwar, and recent construction cohorts to be distinguished. Structure type distinguishes single-family detached homes, single-family attached (row houses and townhomes), multi-family buildings by unit count (2–4, 5–9, 10–19, 20–49, 50 or more), and mobile or manufactured homes. Number of rooms, number of bedrooms, and number of bathrooms are recorded separately. Square footage for new construction is collected in recent waves, though it is not available for older units.

Plumbing completeness — whether the unit has hot and cold running water, a flush toilet, and a bathtub or shower — is a long-standing AHS variable that enables trend analysis across five decades. Kitchen completeness similarly tracks whether the unit has a sink with piped water, a range or stove, and a refrigerator. Heating fuel source distinguishes natural gas, electricity, fuel oil or kerosene, wood, solar energy, and no heating equipment. Foundation type and exterior wall material are recorded for owned homes. These variables collectively define the physical quality envelope of the housing stock.

Survey Content: Condition Problems

The AHS condition module is what sets it apart from every other federal housing data source. Respondents are asked whether, in the past 12 months, the unit experienced each of a standardized list of deficiencies: a leaking roof, water leaks from inside the structure (broken pipes, leaking plumbing fixtures), holes in the floor, broken plaster or peeling paint covering more than one square foot, electrical wiring that frequently causes blown fuses or tripped breakers, exposed wiring, and evidence of rodents (mice, rats) seen inside the unit in the past 12 months. Heating equipment breakdown — whether the main heating equipment was broken down for six or more hours during the cold months — is tracked separately.

From this condition data the Census Bureau and HUD derive a three-category adequacy classification. Units with none of the above deficiencies are classified as “adequate.” Units with one or more moderate deficiencies — including incomplete plumbing used by fewer than two other units, a leaking roof, or exterior water leaks — are classified as “moderately inadequate.” Units with severe deficiencies — lacking complete plumbing, having electrical hazards, or exhibiting multiple moderate problems — are classified as “severely inadequate.” This adequacy variable (ADEQUACY in the microdata) drives the HUD Worst Case Housing Needs report to Congress.

Survey Content: Neighborhood Quality and Cost

Beyond the physical structure the AHS asks occupants to characterize the neighborhood surrounding the unit. The neighborhood module covers crime and vandalism visible in the neighborhood, abandoned or vandalized buildings on the block, street noise and traffic, trash and litter, and overall neighborhood rating. These assessments are subjective — they reflect the occupant's perception rather than administrative crime counts — but they provide contextual data that objective crime statistics at the census tract level do not always capture for individual blocks or addresses.

Cost data covers monthly contract rent for renters, monthly mortgage payment, real estate taxes, homeowner insurance, and condominium or cooperative fees for owners. The survey records current market value as estimated by the owner, a variable that can be compared to the American Community Survey's owner-estimated value to understand differences in self-reported valuations between the two surveys. Household income is collected so that cost burden — rent or housing costs as a share of income — can be computed from the microdata rather than relied upon solely from published tables.

The Historical Record: 1973 to Present

The AHS national series begins in 1973, making it the longest continuous US housing quality time series in existence. The depth of this record enables trend analysis that no other dataset can support. Several patterns stand out across the half-century panel.

The share of housing units lacking complete plumbing fell dramatically over the first decades of the survey. In 1973 approximately 4.5 percent of all housing units lacked complete plumbing — a measure that captures the persistence of rural and older urban substandard housing from the pre-war era. By 2021 that share had fallen below 0.5 percent, reflecting both housing turnover and rehabilitation. The near-elimination of incomplete plumbing in occupied units is one of the clearest documented improvements in US housing quality over the past 50 years.

Mobile and manufactured homes have held a remarkably stable share of the housing stock. The AHS has recorded a mobile home share of approximately 6 percent across most waves, reflecting the persistence of manufactured housing as an affordable ownership pathway in rural and exurban markets. The manufactured housing stock ages in place: the share of mobile homes built before 1980 increased steadily as new unit shipments declined after their peak in the late 1990s.

Owner-occupancy rates tracked by the AHS follow the well-documented arc of the housing boom and bust. The owner-occupied share peaked at approximately 69 percent in the 2004–2005 waves, consistent with the loose underwriting conditions of the subprime era, fell to roughly 63 percent by 2016 as foreclosures converted owner units to rental, and partially recovered to approximately 65 percent by 2021. The AHS permits this trend to be observed in the same panel of units, not merely in the cross-sectional snapshots that other surveys provide.

Median square footage for new single-family homes grew substantially over the period. In the early 1970s the median new single-family home covered approximately 1,500 square feet. By the 2015 and later waves median new single-family square footage exceeded 2,300 square feet, reflecting the long secular trend toward larger homes among new construction. This expansion in unit size occurred alongside a decline in average household size, producing a substantial increase in square footage per person.

The share of housing units classified as severely inadequate declined from approximately 8 percent in the 1970s to roughly 3 percent in recent waves. Moderately inadequate units declined proportionally. The improvement reflects investment in housing rehabilitation, demolition of the worst pre-war stock, and new construction that generally meets minimum habitability standards. However, the AHS condition module is self-reported, and some research suggests that respondents may under-report deficiencies relative to physical inspection — a limitation that must be accounted for when interpreting the trend as an absolute measure of housing quality.

Metropolitan Oversamples

In addition to the national panel the Census Bureau surveys the 25 largest metropolitan areas on a four-year rotation. Each metro-area survey oversamples units within the MSA to provide the statistical precision needed for MSA-level estimates of housing conditions, tenure, cost burden, and neighborhood quality. Because the national sample alone does not support reliable metro-level estimates for the full range of housing variables, the metropolitan oversample is essential for comparative urban research.

Recent metropolitan AHS waves include New York City (2021), Los Angeles (2019), and Chicago (2021). The metropolitan surveys use the same questionnaire as the national survey, enabling direct comparison across metros and against national estimates. Analysts can compare the share of renter households spending more than half their income on housing in New York versus Los Angeles versus the national average, or compare the prevalence of roof leaks, heating breakdowns, or rodent sightings across markets with very different housing stocks. This cross-metro comparability is rare in housing data: most city-level housing surveys use different methodologies and cannot be directly compared.

HUD Worst Case Housing Needs

The most prominent policy use of AHS microdata is the biennial HUD Worst Case Housing Needs report submitted to Congress. The report operationalizes a specific definition of housing hardship: very low income renter households — those with incomes below 50 percent of the area median income — who receive no federal housing assistance and who either pay more than half their gross income for rent and utilities, or live in a unit classified as severely inadequate under the AHS adequacy measure, or both. Households meeting these criteria are defined as having “worst case needs.”

The 2023 Worst Case Housing Needs report found 8.5 million households in this condition. The count increased in every two-year reporting period from 2013 through 2023, driven primarily by rent increases that outpaced income growth among very low income renters. The primary structural driver identified in the report is the gap between the number of households eligible for Housing Choice Vouchers and the number of vouchers actually funded: HCV funding reaches fewer than one in four eligible households, leaving the majority to compete in private markets without subsidy while paying effective housing cost burdens that leave minimal income for other necessities. The AHS microdata is the direct source for these counts; without the annual adequacy classification and the joint distribution of income, rent, and assistance status that the AHS provides, the report could not be produced.

Data Access and Key Variables

AHS microdata is available at census.gov/programs-surveys/ahs/data.html. The Census Bureau publishes Public Use Files (PUF) in both SAS transport format and flat CSV. The national PUF is released approximately six months after each survey wave and covers all occupied and vacant housing units in the national sample. Metropolitan PUFs are released on the same schedule as their survey waves.

Key variables in the AHS PUF include: TENURE (1 = owned or buying, 2 = rented, 3 = occupied without payment of rent), ROOMS (total rooms), BEDRMS (bedrooms), BATHROOMS (full and half baths), BUILT (year structure built, in bands), STRUCTURETYPE (single-family detached, attached, 2–4 unit, 5–9 unit, 10–19 unit, 20–49 unit, 50+ unit, mobile home), RENT (monthly contract rent for renters), VALUE (owner-estimated current market value for owners), ZINC2 (household income, annualized), HHSEX (sex of householder), HHRACE (race of householder), and ADEQUACY (1 = adequate, 2 = moderately inadequate, 3 = severely inadequate). Condition deficiency flags — roof leaks, interior water leaks, rodents, electrical problems, heating breakdowns, holes in floors, broken plaster — are each coded as separate binary indicator variables.

The Census Bureau's API does not expose AHS data. There is no AHS endpoint at api.census.gov. All AHS analysis requires downloading the microdata files and working with them locally. The Census Bureau previously offered a DataFerrett browser tool for AHS cross-tabulations, though that tool has been retired; the Census Data Explorer and the published summary tables at census.gov provide pre-computed cross-tabulations for analysts who do not need custom tabulations from the microdata.

The PUF includes a household weight variable (WEIGHT) that expands the sample to national population estimates. All nationally representative analyses should apply this weight. The Census Bureau publishes replicate weights for variance estimation using the successive difference replication method, enabling analysts to compute standard errors that account for the complex sample design.

Python: Renter Cost Burden by Building-Age Cohort

The following script downloads the AHS 2021 National Public Use File, filters to occupied renter units, and computes the share of renters spending 30 percent or more of household income on rent — the standard HUD cost burden threshold — broken out by the age cohort of the building. The analysis tests whether newer apartment buildings are systematically more or less likely to impose cost burden on their occupants than older stock, controlling implicitly for the selection of lower-income renters into older and less desirable units.

import pandas as pd
import urllib.request
import zipfile
import io

# ---------------------------------------------------------------
# Step 1: Download AHS 2021 National Public Use File (PUF)
# Available at:
#   https://www.census.gov/programs-surveys/ahs/data/2021/ahs-2021-public-use-file--puf-/ahs-2021-national-public-use-file--puf-.html
# The flat CSV is named newhouse.csv inside the ZIP archive.
# ---------------------------------------------------------------
PUF_URL = (
    "https://www2.census.gov/programs-surveys/ahs/2021/AHS%202021%20National%20PUF%20v1.0%20CSV.zip"
)

print("Downloading AHS 2021 National PUF...")
with urllib.request.urlopen(PUF_URL) as resp:
    zf = zipfile.ZipFile(io.BytesIO(resp.read()))
    csv_name = [n for n in zf.namelist() if n.lower().endswith(".csv")][0]
    df = pd.read_csv(zf.open(csv_name), low_memory=False)

print("Total records:", len(df))

# ---------------------------------------------------------------
# Step 2: Filter to occupied renter units
# TENURE: '1' = owned/bought, '2' = rented, '3' = occupied without
#         payment. We want renters (TENURE == '2').
# VACANCY: present only for vacant units; restrict to occupied.
# ---------------------------------------------------------------
# Strip surrounding quotes that Census PUF uses for character vars
for col in ["TENURE"]:
    if df[col].dtype == object:
        df[col] = df[col].str.strip("'")

renters = df[df["TENURE"] == "2"].copy()
print("Occupied renter units:", len(renters))

# ---------------------------------------------------------------
# Step 3: Convert key variables to numeric
# ZINC2  = household income (annual, dollars; -6 = not reported)
# RENT   = monthly contract rent (-6 = not reported or not applicable)
# BUILT  = year structure was built (-6 = not reported)
# ---------------------------------------------------------------
for col in ["ZINC2", "RENT", "BUILT"]:
    renters[col] = pd.to_numeric(renters[col], errors="coerce")

# Census uses -6 as the missing/not-reported sentinel in the PUF
sentinel = -6
for col in ["ZINC2", "RENT", "BUILT"]:
    renters[col] = renters[col].where(renters[col] != sentinel)

# Drop rows missing income, rent, or year built
renters = renters.dropna(subset=["ZINC2", "RENT", "BUILT"])

# Exclude implausible values: zero or negative income, zero rent
renters = renters[(renters["ZINC2"] > 0) & (renters["RENT"] > 0)]
print("Renters after cleaning:", len(renters))

# ---------------------------------------------------------------
# Step 4: Compute annual rent and cost burden
# HUD defines cost burden as gross rent >= 30% of gross income.
# RENT is monthly contract rent; annualize for comparison with ZINC2.
# ---------------------------------------------------------------
renters["annual_rent"] = renters["RENT"] * 12
renters["rent_burden_ratio"] = renters["annual_rent"] / renters["ZINC2"]
renters["cost_burdened"] = renters["rent_burden_ratio"] >= 0.30
renters["severely_burdened"] = renters["rent_burden_ratio"] >= 0.50

# ---------------------------------------------------------------
# Step 5: Assign building-age cohorts
# ---------------------------------------------------------------
def age_cohort(year):
    if year < 1940:
        return "Pre-1940"
    elif year < 1970:
        return "1940-1969"
    elif year < 2000:
        return "1970-1999"
    else:
        return "2000+"

renters["cohort"] = renters["BUILT"].apply(age_cohort)

# ---------------------------------------------------------------
# Step 6: Summarize cost burden rates by cohort
# ---------------------------------------------------------------
cohort_order = ["Pre-1940", "1940-1969", "1970-1999", "2000+"]

summary = (
    renters.groupby("cohort")
    .agg(
        units=("rent_burden_ratio", "count"),
        pct_burdened=("cost_burdened", "mean"),
        pct_severely_burdened=("severely_burdened", "mean"),
        median_rent=("RENT", "median"),
        median_income=("ZINC2", "median"),
    )
    .loc[cohort_order]
    .reset_index()
)

summary["pct_burdened"] = (summary["pct_burdened"] * 100).round(1)
summary["pct_severely_burdened"] = (summary["pct_severely_burdened"] * 100).round(1)
summary["median_rent"] = summary["median_rent"].round(0).astype(int)
summary["median_income"] = summary["median_income"].round(0).astype(int)

print("")
print("Renter cost burden by building-age cohort (AHS 2021 National PUF)")
print(summary.to_string(index=False))

# ---------------------------------------------------------------
# Step 7: Statistical note
# The AHS PUF uses household weights (WEIGHT variable) for
# nationally representative estimates. This script uses unweighted
# unit counts for clarity; production analyses should apply WEIGHT
# using the survey library or manual expansion.
# ---------------------------------------------------------------
print("")
print("Note: unweighted counts. Apply WEIGHT for population estimates.")

The expected finding from AHS data is that cost burden rates do not decline monotonically with building age. Pre-1940 stock — concentrated in dense urban cores where rents are high — often shows cost burden rates comparable to recently constructed units in high-cost metros. The 1970–1999 cohort, which includes the large suburban rental stock built during the apartment construction boom of the 1970s, often shows the lowest cost burden rates in lower-cost markets, because that stock has aged into relative affordability without the neighborhood premium of older urban housing. The 2000-and-later cohort varies widely by metro: in markets with significant recent luxury construction, newer units serve higher-income renters and show lower burden rates; in supply-constrained coastal markets, even new units command rents that burden moderate-income households.


The AHS condition data on housing adequacy pairs directly with HUD's Low Income Housing Tax Credit production database to understand whether subsidized affordable housing stock avoids the structural deficiencies concentrated in the older unsubsidized rental inventory. See HUD LIHTC Database: Mapping 35 Years of Low-Income Housing Tax Credit Projects.

For the cross-sectional demographic and housing cost context that the AHS panel supplements at the census tract level — including cost burden rates by neighborhood, housing tenure, and household composition — see Census ACS: The American Community Survey and the Federal Demographic Dataset Behind Every Policy Decision.