The Federal Railroad Administration maintains the National Highway-Rail Crossing Inventory — a federal record of every place in the United States where a road and a railroad meet, public or private, at grade or grade-separated. Each of the 250,636 crossings carries a unique seven-character DOT/AAR Crossing Inventory Number, and a companion accident database records every collision between a train and a highway user at those crossings, making the pair the foundation of US grade crossing safety analysis.
What it is
The National Highway-Rail Crossing Inventory is the Federal Railroad Administration's master register of every highway-rail crossing in the country. It is administered by the FRA Office of Safety Analysis and is the authoritative source for the location, physical configuration, and traffic characteristics of each crossing. The inventory covers all crossing types: public crossings (where a public roadway crosses the track), private crossings (farm roads, industrial access, private drives), and pedestrian crossings, in both at-grade form — where road and rail intersect on the same level — and grade-separated form, where one passes over or under the other on a bridge or in an underpass.
The organizing key of the entire system is the Crossing Inventory Number, often called the DOT number or the DOT/AAR crossing number. It is a seven-character identifier — six digits followed by a single alphabetic check character (for example, 845212U) — that uniquely and permanently identifies a single physical crossing. The numbering scheme is jointly maintained by the US Department of Transportation and the Association of American Railroads, which is why it carries both names. The DOT number is painted or posted on a metal tag at the crossing itself, on the signal bungalow or a crossbuck post, so that a motorist, a 911 caller, or a railroad employee at the scene can report the exact location of a stalled vehicle or an emergency by reading off the number. The Emergency Notification System signs that appear at modern crossings pair the DOT number with a toll-free railroad dispatch line for exactly this purpose.
Reporting to the inventory is mandated. Under the federal regulations implementing the rail safety statutes, railroads and state transportation agencies are jointly responsible for keeping crossing records current. The FRA issued a rule (codified at 49 CFR Part 234, Subpart F) requiring railroads and states to submit and periodically update inventory data for the crossings under their purview, with defined reporting cycles — new and changed crossings reported promptly and a full update of existing records on a recurring multi-year schedule. The division of labor is practical rather than tidy: the operating railroad typically holds the best information about track, train, and signal characteristics, while the state DOT or local road authority holds the best information about the roadway, traffic counts, and approach geometry. A complete inventory record is therefore a merge of what both parties know.
The schema concept
Each inventory record describes one crossing across several families of fields. Understanding these families is the key to using the data, because most analytical questions reduce to filtering, grouping, or joining on a handful of them.
Identity and location. The Crossing Inventory Number is the primary key. Location is captured as latitude and longitude, plus the street or highway name, the city or place, the county, and the state. A railroad milepost and a subdivision or branch name locate the crossing along the track. The geographic coordinates allow the inventory to be mapped directly and joined to other spatial datasets — census tracts, school locations, hospital service areas — though coordinate quality varies and older records sometimes carry coordinates derived from the nearest road centroid rather than a field survey.
Railroad and operations. The operating railroad is identified by its reporting mark and name; a crossing may list a primary operating railroad plus additional railroads with trackage rights over the same line. The number of daily train movements is recorded, frequently split into day-through, night-through, and switching movements, along with the number of passenger trains where applicable. The maximum timetable speed — the highest speed trains are authorized to operate over that segment — is a core field, because closing speed determines both the warning time available and the severity of a collision. Together these capture the rail side of the exposure equation.
Highway traffic. The roadway side records the Annual Average Daily Traffic (AADT), the count of vehicles using the road on a typical day, often paired with the year the traffic count was taken and an estimated percentage of trucks. The functional classification of the roadway, the number of traffic lanes, and whether the road is paved are recorded. AADT is the single most important highway field for risk analysis, because the probability of a vehicle being present when a train arrives scales with traffic volume.
Physical configuration. The number of tracks at the crossing (a multiple-track crossing introduces the second-train hazard, where a motorist proceeds after one train clears without seeing a second train approaching on an adjacent track), the crossing angle between road and rail, the crossing surface material (timber, asphalt, concrete, rubber, or composite panels), and the presence of nearby features such as an adjacent highway intersection or a commercial driveway close to the tracks.
Warning devices. This is the field that dominates safety analysis. The inventory codes the highest type of warning device present at the crossing, and the fundamental division is between passive and active devices. Passive devices give the motorist no dynamic indication of an approaching train: crossbucks (the familiar X-shaped “RAILROAD CROSSING” sign, which is the legal equivalent of a yield sign at every public crossing), stop signs, yield signs, pavement markings, and advance warning signs. The driver is responsible for looking, listening, and deciding whether it is safe to proceed.Active devices respond to the approach of a train: flashing lights, bells, and — the highest standard protection — automatic gates that descend across the roadway. Active systems are triggered by track circuits or constant-warning-time predictors that detect an approaching train and provide a consistent warning interval regardless of train speed. The inventory's warning-device code is what lets an analyst sort the national crossing stock into protection tiers and quantify where the gaps are.
The companion accident file
The inventory describes the crossings; the FRA Highway-Rail Grade Crossing Accident/Incident database describes what happens at them. This second dataset is a mandatory collection of every reportable collision between a railroad train (or other on-track equipment) and a highway user — a car, truck, bus, motorcycle, bicycle, pedestrian, or other vehicle — at a highway-rail crossing. Railroads file these reports on FRA Form 6180.57, and the records flow into the same FRA Office of Safety Analysis system that publishes the inventory.
Each accident record references the Crossing Inventory Number of the crossing where the collision occurred, which is precisely what makes the two datasets a matched pair: an analyst can attach the full physical and traffic profile of a crossing to every collision that ever happened there. The accident record itself captures the consequences and the circumstances. Consequence fields include the number of fatalities, the number of injuries, and whether the highway vehicle's occupants or a pedestrian were killed or hurt. Circumstance fields are extensive: the type of highway user involved (automobile, truck, truck-trailer, bus, pedestrian, and so on), the driver's action at the time (drove around or through the gates, stopped and then proceeded, did not stop, stalled on the crossing), the position of the vehicle (stalled or stopped on the crossing, moving over it), the weather and visibility, the time of day and lighting conditions, whether the train was striking the vehicle or the vehicle struck the train, the train's speed at impact, and — critically — the type of warning device present and whether it was functioning at the time.
That last field is what allows the accident database to test the inventory's promise. If a crossing is coded in the inventory as gate-protected, the accident record can confirm whether the gates were down and the lights flashing when the collision occurred — distinguishing a device failure from a driver who went around lowered gates. The most common driver-action finding in fatal gate-crossing collisions is not equipment failure but deliberate violation: motorists driving around or through gates that were properly activated. The accident file is the evidentiary basis for that finding.
It is essential to keep the grade crossing accident file separate from the FRA's other major safety dataset, the Rail Equipment Accident/Incident file (derailments, collisions between trains, and other train accidents reported on Form 6180.54). They are different forms, different reporting thresholds, and different analytical populations. A grade crossing collision is reported on 6180.57 regardless of dollar damage if it meets the highway-rail collision definition; a derailment is reported on 6180.54 when monetary damage exceeds a periodically adjusted reporting threshold. Mixing the two inflates counts and confuses cause analysis.
Safety programs
Grade crossing safety in the United States is pursued through a small number of long-running programs, and the inventory and accident databases are the measurement infrastructure underneath all of them.
Operation Lifesaver. Founded in 1972 in Idaho and now a national nonprofit, Operation Lifesaver is the public-education arm of grade crossing safety. It trains volunteer presenters, runs awareness campaigns (the long-running “Stop, Trains Can't” and “See Tracks? Think Train.” messaging), and targets specific high-risk audiences such as professional drivers, school bus operators, and new drivers. Operation Lifesaver is an education-and-enforcement complement to the engineering programs: its premise is that since the majority of crossing collisions involve driver behavior at crossings that already have warning devices, changing behavior is as important as upgrading hardware.
The Section 130 program. The Railway-Highway Crossings Program — universally called Section 130 after its section in Title 23 of the US Code — is the federal funding mechanism for physical crossing safety improvements. Administered by the Federal Highway Administration and delivered through state DOTs, Section 130 is a set-aside within the Highway Safety Improvement Program that provides federal funds, typically at a 90 percent federal share, for installing and upgrading active warning devices, improving crossing surfaces and approaches, and — importantly — for closing redundant crossings. States are required to maintain a survey of their crossings and to set priorities for improvement, and a portion of Section 130 funds is reserved for the elimination of hazards including crossing closures. The inventory and the accident file are the raw material for the prioritization formulas: states rank candidate crossings using exposure (traffic times trains), collision history, train speed, and existing protection, and direct limited funds to the crossings where an upgrade buys the most risk reduction.
Quiet zones. A quiet zone is a stretch of rail line, designated under an FRA rule (49 CFR Part 222), where train horns are not routinely sounded at crossings. Because the train horn is itself a warning device, silencing it requires compensating safety measures — supplementary or alternative safety measures such as four-quadrant gates (which block both approach and departure lanes to prevent drivers from going around), medians or channelization devices that physically prevent crossing into the opposing lane around a gate, or one-way street configurations. Communities pursue quiet zones to reduce horn noise in residential areas, and the FRA evaluates whether the proposed measures keep the crossing corridor's risk at or below the level the horn provided. Quiet zone establishment is data-driven: the FRA uses a quantitative risk index built from inventory and accident data to determine whether a corridor qualifies.
The long decline, and the persistent floor.The combined effect of these programs over half a century has been dramatic. Grade crossing collisions in the United States have fallen substantially from their levels in the 1970s, when annual collisions numbered around 12,000 and crossing fatalities were far higher than today. Through Section 130 device upgrades, mass crossing closures, and sustained public education, the count of crossing collisions has dropped by roughly three-quarters even as both highway and rail traffic grew. But the decline has flattened into a stubborn floor: in recent years the United States still records on the order of 2,000 highway-rail grade crossing collisions per year, killing several hundred people annually. The remaining incidents are dominated by driver behavior at crossings that already have active warning — the hardest category to engineer away — which is why education and closures, rather than yet more devices, increasingly drive the marginal safety gains.
Notable context
Crossing data after high-profile incidents.When a serious crossing collision occurs — a commuter train striking a vehicle stopped on the tracks, or a school bus involved — the inventory and accident records become the immediate factual baseline for investigators and the press. The DOT number ties the incident to the crossing's documented warning-device class, train speed, traffic volume, and prior collision history, and the NTSB, when it investigates a major crossing accident, builds on exactly these records. High-profile collisions involving commuter rail at suburban crossings have repeatedly driven interest in second-train warnings, longer gate arms, and quiet-zone compensating measures, and the post-incident analysis always starts from the federal crossing data.
Trespassing versus crossing fatalities. A crucial distinction for anyone using rail safety data is that grade crossing fatalities are not the largest category of rail-related deaths. Rail trespassing casualties — people struck while walking on or along the tracks away from a crossing — consistently exceed grade crossing fatalities and have for years. The two are counted in different FRA datasets and have different intervention strategies: crossing deaths are addressed through the inventory-driven engineering and education programs above, while trespasser deaths are addressed through fencing, right-of-way access control, and outreach. Conflating “crossing deaths” with “all rail-related deaths” is a common error; the grade crossing accident file specifically excludes trespasser incidents, which are reported separately.
The push to close redundant crossings. The single most effective grade crossing safety measure is elimination: a closed crossing has zero collision risk. Federal policy has long favored consolidation — closing lightly used crossings and routing their traffic to a nearby crossing with better protection, or grade-separating high-exposure crossings entirely. Section 130 funds and incentive payments support closures, and the FRA and railroads actively campaign to reduce the total crossing count. The inventory makes the redundancy visible: mapping crossings with their AADT and spacing reveals clusters of low-traffic crossings within short distances of each other, the prime candidates for consolidation. The slow, steady fall in the total number of public crossings over the decades is the closure program made measurable.
What you can do with it
The value of the paired datasets is that they turn crossing safety from anecdote into measurement. A handful of analyses recur across state DOTs, railroads, journalists, and researchers.
Rank the most dangerous crossings. The first-order question — where are the worst crossings? — has two valid answers depending on framing. Ranking by raw collisions per crossing identifies the crossings with the most incidents, but those tend simply to be the busiest. Ranking by an exposure-adjusted rate — collisions normalized by traffic and train volume — identifies crossings that are dangerous relative to how much conflict opportunity they present, which is the more useful signal for intervention. Both belong in any serious analysis.
Map warning-device gaps. Joining the inventory to population, school, and road data and filtering to passive-only public crossings on higher-speed or higher-traffic lines surfaces the crossings most in need of an active-device upgrade. This is the core input to a state's Section 130 candidate list.
Prioritize Section 130 spending. With a fixed annual budget, a state must choose which crossings to upgrade or close. Combining collision history, exposure, train speed, and current protection into a ranked priority index — the same logic the federal program requires — directs limited dollars to the highest-return projects and provides a defensible, data-backed basis for the choices.
Correlate speed and traffic with risk. Because the inventory carries maximum timetable speed and AADT and the accident file carries outcomes, the data supports direct study of how collision frequency and severity scale with train speed, traffic volume, number of tracks, and device class — the empirical questions behind every grade crossing engineering standard.
Python: joining the inventory to the accident file
The workflow below loads both FRA datasets, joins them on the Crossing Inventory Number, classifies each crossing's warning device as active or passive, and computes the collision rate by device class — the foundational grade crossing safety statistic. A third step ranks crossings by an exposure-adjusted risk index. The scripts use substring-matching column detection throughout because the FRA changes field names across release years, and the published download URLs change as well; the URLs shown are placeholders to be replaced with the current export links from the FRA Office of Safety Analysis site.
Step 1: Load both datasets
import requests, io, pandas as pd
# ---------------------------------------------------------------------------
# FRA Highway-Rail Grade Crossing Inventory + Accident/Incident files
#
# Primary access points:
# FRA Office of Safety Analysis: safetydata.fra.dot.gov
# Crossing inventory downloads: railroads.dot.gov (Highway-Rail Crossing
# Inventory data)
#
# Two distinct datasets, joined on the 7-character Crossing ID:
# 1. Crossing Inventory - one row per physical crossing (~250,000 rows)
# location, railroad, traffic, warning device
# 2. Accident/Incident - one row per collision between a train and a
# highway user; references the Crossing ID of
# the crossing where it occurred
#
# Both are published as flat CSV. Column names differ slightly across
# release years; normalize before joining.
# ---------------------------------------------------------------------------
INVENTORY_CSV = "https://example.fra.dot.gov/crossing_inventory.csv" # current export URL
ACCIDENT_CSV = "https://example.fra.dot.gov/highway_rail_accidents.csv"
def fetch_csv(url):
resp = requests.get(url, timeout=180)
resp.raise_for_status()
# FRA files contain non-ASCII characters in street/operator names
try:
return pd.read_csv(io.StringIO(resp.text), low_memory=False)
except UnicodeDecodeError:
return pd.read_csv(io.StringIO(resp.content.decode("latin-1")), low_memory=False)
inv = fetch_csv(INVENTORY_CSV)
acc = fetch_csv(ACCIDENT_CSV)
# Normalize column names
inv.columns = [c.strip().lower().replace(" ", "_") for c in inv.columns]
acc.columns = [c.strip().lower().replace(" ", "_") for c in acc.columns]
print(f"Inventory rows: {len(inv):,}")
print(f"Accident rows: {len(acc):,}")
Step 2: Join and compute rate by device class
import pandas as pd
# Assumes inv (inventory) and acc (accidents) are loaded and normalized.
# ---------------------------------------------------------------------------
# 1. Identify the Crossing ID join key in each frame.
# The inventory typically calls it 'crossing' or 'crossingid';
# the accident file typically calls it 'gxid' or 'highway'.
# ---------------------------------------------------------------------------
inv_id = next((c for c in inv.columns if c in
("crossing", "crossingid", "crossing_id", "dot_crossing_no")), None)
acc_id = next((c for c in acc.columns if c in
("gxid", "crossing", "crossingid", "crossing_id", "highway")), None)
inv = inv.rename(columns={inv_id: "crossing_id"})
acc = acc.rename(columns={acc_id: "crossing_id"})
# Strip whitespace; the 7-char ID is alphanumeric (6 digits + 1 check letter)
inv["crossing_id"] = inv["crossing_id"].astype(str).str.strip().str.upper()
acc["crossing_id"] = acc["crossing_id"].astype(str).str.strip().str.upper()
# ---------------------------------------------------------------------------
# 2. Classify each crossing's warning device as ACTIVE or PASSIVE.
# The inventory codes the highest type of warning device present.
# Active = gates, flashing lights, highway signals, wigwags, bells.
# Passive = crossbucks, stop signs, yield signs, other signs, none.
# ---------------------------------------------------------------------------
dev_col = next((c for c in inv.columns if "wd_code" in c or "warning"
in c or "device" in c), None)
ACTIVE_TERMS = ("gate", "flashing", "light", "signal", "wigwag", "bell")
def classify(val):
s = str(val).lower()
return "active" if any(t in s for t in ACTIVE_TERMS) else "passive"
inv["device_class"] = inv[dev_col].map(classify)
print(inv["device_class"].value_counts())
# ---------------------------------------------------------------------------
# 3. Count collisions per crossing, then attach the device class.
# ---------------------------------------------------------------------------
collisions = (acc.groupby("crossing_id").size()
.rename("collisions").reset_index())
merged = inv.merge(collisions, on="crossing_id", how="left")
merged["collisions"] = merged["collisions"].fillna(0).astype(int)
# ---------------------------------------------------------------------------
# 4. Incident rate by warning-device class (collisions per 1,000 crossings).
# ---------------------------------------------------------------------------
by_class = merged.groupby("device_class").agg(
crossings=("crossing_id", "nunique"),
collisions=("collisions", "sum"),
)
by_class["per_1000"] = (by_class["collisions"] / by_class["crossings"] * 1000)
print("\nCollision rate by warning-device class:")
print(by_class.to_string(float_format=lambda x: f"{x:,.1f}"))
Step 3: Exposure-adjusted ranking
import pandas as pd
# Assumes 'merged' from the previous step (inventory + collision counts).
# ---------------------------------------------------------------------------
# Exposure-adjusted ranking.
#
# Raw collision counts favor busy crossings. To find genuinely DANGEROUS
# crossings, normalize by exposure = AADT (annual average daily traffic of
# vehicles) x daily train movements. This approximates the number of
# vehicle-train conflict opportunities per day.
# ---------------------------------------------------------------------------
aadt_col = next((c for c in merged.columns if "aadt" in c), None)
trains_col = next((c for c in merged.columns
if "totaltrn" in c or "trains" in c or "daytrn" in c), None)
merged["aadt"] = pd.to_numeric(merged[aadt_col], errors="coerce")
merged["trains"] = pd.to_numeric(merged[trains_col], errors="coerce")
# Exposure index; guard against zero/missing
merged["exposure"] = (merged["aadt"].fillna(0) * merged["trains"].fillna(0))
# Crossings with at least one recorded collision and positive exposure
hot = merged[(merged["collisions"] > 0) & (merged["exposure"] > 0)].copy()
# Collisions per million exposure-units (rough risk index)
hot["risk_index"] = hot["collisions"] / (hot["exposure"] / 1_000_000)
top = hot.sort_values("risk_index", ascending=False).head(15)
cols = ["crossing_id", "collisions", "aadt", "trains",
"device_class", "risk_index"]
cols = [c for c in cols if c in top.columns]
print("Top 15 crossings by exposure-adjusted risk index:")
print(top[cols].to_string(index=False,
float_format=lambda x: f"{x:,.2f}"))
The join is the load-bearing step. Because the Crossing Inventory Number is the shared key, normalizing it on both sides — stripping whitespace and upper-casing the check character — is essential; a mismatch in formatting silently drops every accident at the affected crossings. The classification of warning devices into active and passive is the analytical pivot that produces the headline result: passive-only crossings, despite carrying far less traffic on average, account for a markedly disproportionate share of collisions and an even higher share of fatalities per collision, because the motorist receives no dynamic warning and the absence of gates removes the physical and psychological barrier to proceeding. That single comparison — rate per crossing for active versus passive devices — is the empirical argument for the entire Section 130 upgrade program.
Caveats
Inventory update lag. The inventory is only as current as its last update. Although reporting is mandated on a cycle, a crossing that was recently upgraded from crossbucks to gates, or recently closed, may still carry its old record for a period. Traffic counts (AADT) in particular can be years old, and a stale AADT distorts any exposure calculation. Analysts should check the inventory record's update or revision date and treat very old records, especially their traffic and device fields, with appropriate skepticism.
Private-crossing under-reporting. Public crossings are well documented because public road authorities and railroads both have clear reporting responsibility. Private crossings — farm and industrial access, private driveways — are systematically less complete. Some private crossings are missing from the inventory entirely, and those that are present often have sparse traffic and device data. Any analysis of total crossing counts, or of the private crossing population specifically, must acknowledge that the private side of the inventory is the weaker half. Collisions at unreported private crossings can also complicate the join, since the accident may reference a crossing ID that has no matching inventory record.
Exposure is required to normalize raw counts.The most important interpretive caveat is that raw collision counts are nearly meaningless as a ranking of danger. A crossing with five collisions on a road carrying 40,000 vehicles a day crossed by 60 trains is not necessarily more dangerous than a crossing with two collisions on a remote road carrying 200 vehicles crossed by four trains — the second may have a far higher collision rate per unit of exposure. Normalizing by exposure (AADT multiplied by daily trains) is what converts a count into a risk measure. Equally, collisions are rare events at the level of any single crossing, so small-number noise dominates: a crossing with one collision in a decade should not be ranked confidently against another with two. Sound analysis pools crossings into device and exposure classes for stable rate estimates rather than over-interpreting the history of any individual crossing. These are the same principles — exposure normalization and consequence thresholds — that govern the other federal transportation safety databases, and the FRA crossing data rewards the same disciplined treatment.
Related writing
Related: NHTSA Vehicle Complaints, Recalls, and Investigations Database →
Related: PHMSA Pipeline Safety Data: The Federal Database Behind Gas and Liquid Pipeline Incidents →