Technical writing

NHTSA FARS: The Federal Census of Every US Traffic Fatality Since 1975

· AI Analytics
Federal DataNHTSATraffic SafetyPublic Safety

Every motor vehicle crash in the United States that kills someone within 30 days is recorded in the Fatality Analysis Reporting System — a complete enumeration, not a sample. Maintained by the National Highway Traffic Safety Administration since 1975, FARS is the authoritative federal census of traffic death in America: roughly 38,000 to 42,000 fatalities per year, each documented with crash geometry, vehicle characteristics, driver behavior, and person-level outcomes.

What FARS Is

The Fatality Analysis Reporting System is a nationwide database covering every motor vehicle traffic crash on a public road in the United States that results in the death of any person — occupant, pedestrian, or cyclist — within 30 days of the crash. Coverage extends to all 50 states, the District of Columbia, and Puerto Rico. Data collection began with calendar year 1975 and is published annually, typically with a 12–18 month lag after the reference year.

The distinction between census and sample matters here. NHTSA does not estimate fatality counts using a sample frame and expansion weights. Every qualifying crash is individually coded from police crash reports, death certificates, hospital records, and emergency medical service documentation. When FARS reports 42,795 fatalities in 2022, that is a count of 42,795 individually documented crash events — not a projection from a subset. That census character makes FARS reliable down to the state and, with care, to the county level in ways that survey-based datasets cannot be.

Operationally, state highway safety offices and state police agencies collect the source documents for each fatal crash and transmit coded records to NHTSA. NHTSA quality-checks and standardizes the submissions, then publishes the national file. The coding manual and field definitions are documented in the FARS Analytical Users Manual, updated annually and available through the NHTSA research data portal.

The FARS Data Structure

FARS organizes each fatal crash into a hierarchy of linked relational tables. A single crash can involve multiple vehicles and multiple persons, so the data is structured to capture that complexity without losing granularity.

The accident file contains one record per crash. Key fields include the crash date, time, state and county FIPS codes, city and route type (Interstate, US route, state route, county road, local street), a work zone flag, light condition at time of crash, weather conditions, the first harmful event that caused or contributed to the fatality, the manner of collision, and summary counts of vehicles involved, persons involved, and total fatalities. The crash identifier is ST_CASE — a unique numeric code within each state-year combination.

The vehicle file contains one record per motor vehicle in the crash. Fields include vehicle type (passenger car, light truck, large truck, bus, motorcycle), body type, make, model, and model year, the posted speed limit, the officer-estimated travel speed, a hit-and-run indicator, rollover and fire flags, driver presence, driver age, and — when blood or breath testing was conducted — the driver's blood alcohol concentration (BAC). Not every driver is tested; NHTSA uses multiple imputation to estimate BAC distributions for untested drivers in its published alcohol statistics, though the raw vehicle file contains only tested values plus a test type code.

The person file contains one record per person involved in the crash — drivers, passengers, pedestrians, bicyclists, and motorcyclists alike. Fields include age, sex, person type (driver, passenger, pedestrian, bicyclist, other non-motorist), injury severity coded on the KABCO scale (K=killed, A=serious injury, B=minor injury, C=possible injury, O=no injury), restraint system use, airbag deployment status, whether the person was ejected from the vehicle, helmet use for motorcyclists and bicyclists, and BAC test result when available.

Additional supplemental files cover crash factors (CF1–CF3), vehicle factors, driver factors, distraction codes, and maneuvers. The full annual release typically contains 20–30 interrelated CSV files linkable through ST_CASE and VEH_NO (vehicle number within crash).

Key Coded Variables

FARS uses numeric codes throughout, with lookup tables in the Users Manual. Several variables are analytically central:

The COVID Anomaly and Recent Trends

Traffic fatality trends over the past several years present a striking paradox. In 2020, vehicle miles traveled fell approximately 13% as pandemic lockdowns kept Americans at home. The intuitive prediction — fewer miles driven, fewer deaths — proved incorrect. The fatality rate per 100 million vehicle miles traveled spiked to 1.37, the highest rate since 2007. Total fatalities fell modestly from 36,096 in 2019 to 38,824 in 2020, but because miles driven dropped so sharply, the per-mile rate surged.

NHTSA's analysis of FARS records for 2020 identifies a consistent pattern: drivers who remained on largely empty roads drove faster, were less likely to wear seatbelts, and were more likely to be impaired. Speed was involved in a higher proportion of fatal crashes than in prior years. Speeding-related fatalities increased. The behavioral response to empty roads — not just reduced exposure — drove the rate increase.

In 2021, total fatalities reached 42,939 — the highest count since 2005 — as miles driven recovered while the behavioral deterioration in driving patterns persisted. The 2022 FARS data showed modest improvement, falling to 42,795, and the 2023 preliminary estimates indicated further decline. The multi-year period following the pandemic has elevated traffic death to a renewed policy priority, with NHTSA's National Roadway Safety Strategy targeting a reduction toward zero under the Safe System framework.

Alcohol-Impaired Driving: A Public Health Success Story in the Data

One of the clearest long-term trends visible in FARS is the decline in alcohol-impaired driving fatalities. In the mid-1980s, crashes involving at least one driver with BAC at or above 0.08 accounted for roughly 20,000 deaths per year — more than half of all traffic fatalities. By the early 2020s, that count had fallen to approximately 10,500 per year, or roughly 28% of all traffic deaths.

The reduction reflects several decades of coordinated intervention: the national minimum drinking age of 21 (1984), per se BAC limits lowered from 0.10 to 0.08 across all states by 2004, sobriety checkpoints, ignition interlock mandates for DUI offenders, and sustained public awareness campaigns. FARS's DRUNK_DR field allows researchers to track the impact of these interventions with precision unavailable in any other federal dataset — a 50-year time series of every alcohol-involved traffic fatality, coded by state and jurisdiction.

The remaining challenge is the BAC imputation problem. Only a fraction of drivers in fatal crashes are actually tested for BAC; the fraction varies by state and by survival status (surviving drivers are tested at lower rates than decedents). NHTSA uses multiple imputation — modeling BAC likelihood from crash characteristics and driver behavior indicators — to produce the headline alcohol-involvement statistics. The imputed values are published in NHTSA reports but not in the FARS flat files themselves, which contain only tested BAC values.

Pedestrian and Cyclist Fatalities: A Worsening Trend

While alcohol-impaired fatalities have declined sharply since the 1980s, pedestrian fatalities have moved in the opposite direction over the past decade and a half. In 2009, FARS recorded approximately 4,109 pedestrian deaths. By 2021 that number had risen to 7,342 — an increase of nearly 79% over twelve years and the highest pedestrian death toll since 1981. Cyclist fatalities have also increased, though from a smaller base.

The FARS person file's PER_TYP field enables isolation of these trends with full state-level detail. Contributing factors identified in the literature and in NHTSA analyses include:

Night is disproportionately represented in pedestrian fatalities. FARS light condition data consistently shows that a majority of pedestrian deaths occur in dark conditions despite the majority of pedestrian trips occurring in daylight. The LGT_COND field distinguishes between dark-lighted and dark-not-lighted roadways, enabling policy analysis of street lighting as an intervention.

CRSS: The Companion to FARS for Non-Fatal Crashes

FARS covers only fatal crashes, but the universe of police-reported crashes includes millions of non-fatal injury and property-damage-only events. The Crash Report Sampling System (CRSS) is the NHTSA program that covers that broader population. CRSS draws a stratified sample of police-reported crashes from a set of geographic primary sampling units and applies sampling weights to produce national estimates of crash counts, injury counts, and property damage by crash type.

CRSS replaced the earlier National Automotive Sampling System General Estimates System (NASS GES) beginning with 2016 data. Where FARS provides exact counts for the fatal subset, CRSS provides weighted estimates for the full crash distribution. The two datasets are designed to be complementary: FARS gives the complete picture of fatal outcomes; CRSS gives the injury pyramid above the fatal threshold. Analysts modeling crash severity distributions, estimating the cost of road crashes, or evaluating the benefit-cost ratio of safety interventions typically use both.

An important distinction: CRSS figures are estimates with sampling variance and confidence intervals. A CRSS estimate of 2.1 million injury crashes carries a coefficient of variation that affects statistical comparisons across years. FARS counts carry no sampling variance — the uncertainty in the fatal count is definitional (the 30-day rule, the public road requirement) rather than statistical.

Data Access

NHTSA publishes FARS data through several channels at nhtsa.gov/research-data/fatality-analysis-reporting-system-fars. The primary access methods are:

Python: Pedestrian Fatality Rates by State

The following script downloads the FARS national CSV archive for the most recent published year, extracts the accident and person files, filters the person file to fatal pedestrian injuries, aggregates by state, merges with Census ACS population estimates, and identifies the ten states with the highest pedestrian fatality rates per 100,000 residents.

import requests
import pandas as pd
import io
import zipfile

# Most recent year of published FARS data
YEAR = 2022

# NHTSA publishes annual FARS flat files as ZIP archives containing CSVs.
# The accident file has one row per crash; the person file has one row per person.
# We merge them on ST_CASE (unique crash ID within state-year).

BASE_URL = "https://static.nhtsa.gov/nhtsa/downloads/FARS/" + str(YEAR) + "/National/"

accident_url = BASE_URL + "FARS" + str(YEAR) + "NationalCSV.zip"

resp = requests.get(accident_url, timeout=180)
resp.raise_for_status()

with zipfile.ZipFile(io.BytesIO(resp.content)) as zf:
    names = zf.namelist()
    # Locate accident and person CSV files (names vary slightly by year)
    acc_name = next(n for n in names if "accident" in n.lower() and n.endswith(".CSV"))
    per_name = next(n for n in names if "person" in n.lower() and n.endswith(".CSV"))
    with zf.open(acc_name) as f:
        acc = pd.read_csv(f, dtype=str)
    with zf.open(per_name) as f:
        per = pd.read_csv(f, dtype=str)

# Numeric coercions
for col in ["STATE", "ST_CASE", "FATALS"]:
    acc[col] = pd.to_numeric(acc[col], errors="coerce")

for col in ["STATE", "ST_CASE", "PER_TYP", "INJ_SEV"]:
    per[col] = pd.to_numeric(per[col], errors="coerce")

# Person type codes: 1=driver, 2=passenger, 5=pedestrian, 6=bicyclist, 7=other cyclist
# Injury severity code 4 = fatality
fatal_persons = per[per["INJ_SEV"] == 4].copy()

# Map person type to readable label
type_map = {1: "Driver", 2: "Passenger", 5: "Pedestrian", 6: "Bicyclist", 7: "Cyclist"}
fatal_persons["person_type"] = fatal_persons["PER_TYP"].map(type_map).fillna("Other")

# Aggregate fatalities by state and person type
by_state_type = (
    fatal_persons.groupby(["STATE", "person_type"])
    .size()
    .reset_index(name="fatalities")
)

# Load Census ACS 2022 state population estimates (from Census API)
pop_url = (
    "https://api.census.gov/data/2022/acs/acs1"
    "?get=NAME,B01003_001E&for=state:*"
)
pop_resp = requests.get(pop_url, timeout=60)
pop_resp.raise_for_status()
pop_data = pop_resp.json()
pop_df = pd.DataFrame(pop_data[1:], columns=pop_data[0])
pop_df["STATE"] = pd.to_numeric(pop_df["state"], errors="coerce")
pop_df["population"] = pd.to_numeric(pop_df["B01003_001E"], errors="coerce")
pop_df = pop_df[["STATE", "NAME", "population"]]

# Pedestrian fatalities only, merged with population
ped = by_state_type[by_state_type["person_type"] == "Pedestrian"].copy()
ped = ped.merge(pop_df, on="STATE", how="left")
ped["ped_rate_per_100k"] = (ped["fatalities"] / ped["population"] * 100000).round(2)

# Top 10 states by pedestrian fatality rate
top10 = ped.sort_values("ped_rate_per_100k", ascending=False).head(10)

print(top10[["NAME", "fatalities", "population", "ped_rate_per_100k"]].to_string(index=False))

A few implementation notes. NHTSA's flat file naming conventions shift slightly between years — the script uses a case-insensitive substring match on “accident” and “person” to locate the correct files within the ZIP rather than hardcoding a filename. The person type code for pedestrians is 5 (PER_TYP = 5); bicyclists are coded as 6 and “other non-motorists” as 7, which includes scooter riders and similar. Filtering INJ_SEV = 4 in the person file gives the set of individuals who died rather than the crash-level FATALS count, enabling person-type disaggregation.

The Census ACS1 API returns single-year estimates for states with populations above 65,000 — all states qualify. Smaller geographies would require the ACS 5-year estimates. Population-normalizing the count is essential for state comparisons: New Mexico and Wyoming, which consistently rank among the highest pedestrian fatality rate states, have relatively small total populations but high rates driven by road design, speed limits, and driver-pedestrian interaction patterns in their urban areas.

Analytical Applications

Road safety policy evaluation. FARS's 50-year time series enables before-and-after analysis of safety interventions at the state and national level — the minimum drinking age, mandatory seatbelt laws, ignition interlock mandates, rumble strip deployment, intersection redesign programs. Because every state implements policies at different times, FARS supports difference-in-differences designs treating state-year as the unit of observation.

Highway design and infrastructure prioritization. Route type (Interstate, US, state, county, local) and the roadway function class field allow identification of which road classes disproportionately produce fatalities. Rural two-lane US routes consistently account for a share of fatalities well above their share of vehicle miles traveled, making them the focus of NHTSA's rural road safety initiatives.

Commercial vehicle and trucking safety. FARS vehicle type codes distinguish large trucks (combination vehicles and single-unit trucks) from passenger vehicles. Fatalities in crashes involving large trucks — whether the truck occupant, another vehicle occupant, or a pedestrian — are a distinct policy domain subject to FMCSA oversight and Hours of Service regulation.

Equity analysis. Age, sex, and — in fields added in recent FARS years — race and ethnicity of fatally injured persons are available in the person file. Pedestrian fatality rates differ substantially by race and income, reflecting disparities in where people walk, what roads they cross, and what speeds those roads carry. The person-level granularity in FARS enables these analyses without the sample-size constraints that limit survey-based approaches.

Longitudinal risk factor analysis. Linking FARS to vehicle registration data (by VIN-derived make, model, and model year) supports analysis of how crashworthiness improvements — electronic stability control, automatic emergency braking, improved structural design — have affected occupant fatality rates over time, controlling for crash severity.

Limitations and Practical Cautions

The 30-day rule introduces a definitional lag. A person who dies 29 days after a crash is counted in FARS; one who dies on day 31 is not. For crashes involving prolonged hospitalizations — common in severe traumatic brain injury — this creates a discontinuity at the threshold and means that FARS fatality counts will differ from hospital mortality statistics for the same crash events.

The public road requirement excludes crashes on private property, parking lots, driveways, and off-road recreational areas. Fatalities in these locations — including a meaningful share of backing-related deaths — are captured in other data sources (notably CPSC injury databases) but not in FARS.

FARS depends on police crash report completion, and crash report quality varies by state and agency. Certain fields — particularly distraction, fatigue, and drug involvement other than alcohol — are believed to be substantially undercounted because police officers often cannot determine causation at the scene. NHTSA supplements the crash report data with medical examiner toxicology results, but toxicology is not universally available and introduces its own lag.

Multi-year trend analysis requires attention to coding changes. NHTSA periodically revises the FARS coding scheme to add fields, redefine values, or restructure supplemental files. Changes between the 2010 and 2020 editions of the Users Manual are substantial. Any longitudinal analysis spanning more than a few years should review the manual for each year's data to ensure consistent variable definitions.


For complementary federal public safety data, the FBI National Instant Criminal Background Check System documents firearm transaction volumes by state and month — a distinct but related dimension of public safety risk. See FBI NICS: The Federal Firearm Background Check Dataset.

Federal risk datasets covering natural hazard losses complement traffic fatality data in understanding the full landscape of preventable mortality and property loss. The FEMA National Flood Insurance Program publishes every flood insurance claim since 1978. See NFIP Flood Insurance: The Federal Dataset Behind Every US Flood Loss.

Commercial driver employment — trucking, transit, and delivery — connects fatality risk to workforce patterns captured in the BLS Quarterly Census of Employment and Wages. See BLS QCEW: The County-Level Employment and Wages Dataset Behind Every Local Economic Analysis.