Technical writing

FAA Aviation Safety Data: The Federal Databases Behind Every Plane Crash Investigation

· 16 min read· AI Analytics
Federal DataFAAAviation SafetyTransportation

Every civil aviation accident in the United States generates a paper trail across at least four federal databases. The National Transportation Safety Board investigates and publishes every accident and selected incidents. The FAA's own Accident/Incident Data System captures a broader set of occurrences. NASA administers a confidential voluntary reporting system where pilots and controllers describe near-misses that would otherwise go unrecorded. And the FAA Wildlife Strike Database catalogs every reported bird and animal strike going back decades. Together, these systems constitute the most comprehensive civil aviation safety recordkeeping apparatus in the world — and all of it is publicly available.

The four-database ecosystem

US aviation safety data sits across four distinct federal systems, each with a different reporting obligation, scope, and analytical purpose.

NTSB Aviation Accident Database. The National Transportation Safety Board has investigated every civil aviation accident in the United States and published a structured record of each since 1962. The database holds more than 90,000 accidents and incidents, growing at roughly 1,300 to 1,500 new records per year. Coverage is mandatory: operators and pilots are required by regulation to report accidents to the NTSB, and every reported accident receives at minimum a field-office investigation. Fatal accidents involving Part 121 commercial carriers receive full go-team deployments with hundreds of pages of docket material.

FAA Accident/Incident Data System (AIDS). The FAA's own accident and incident database, known by its acronym AIDS, captures both accidents (already covered by NTSB) and a broader population of incidents that did not rise to the NTSB's investigation threshold. AIDS records include runway incursions, airspace deviations, near-midair collisions, aircraft system malfunctions, and other operational anomalies reported by pilots, air traffic controllers, and airport personnel under FAA reporting requirements. The FAA uses AIDS internally for trend detection and enforcement; a publicly accessible version is available through the FAA data portal at av-info.faa.gov.

Aviation Safety Reporting System (ASRS). Operated by NASA under contract with the FAA since 1976, ASRS is the confidential, non-punitive voluntary safety reporting program for aviation. Pilots, air traffic controllers, flight attendants, mechanics, and dispatchers submit written reports describing safety concerns, procedural deviations, and near-misses. The database holds more than one million reports. Critically, reporters receive a limited waiver of FAA enforcement action if they submit an ASRS report within ten days of an incident, creating a structural incentive for participation. The confidentiality protection is the key design feature: ASRS captures categories of safety information that would never appear in any mandatory reporting system because reporters would face consequences for self-disclosure.

FAA Wildlife Strike Database. The FAA Wildlife Strike Database, maintained at wildlifestrike.faa.gov, is the most comprehensive database of bird and animal strikes with aircraft in the world. Reporting is voluntary but participation is high because airlines, airports, and pilots have strong incentives to document strikes for insurance, maintenance, and liability purposes. The database records approximately 17,000 strikes per year and contains over 300,000 historical records. It is the primary data source behind the FAA's wildlife hazard management program and is publicly downloadable as a flat file.

NTSB accident definitions and investigation scope

The NTSB's jurisdiction extends to all civil aviation accidents in the United States and to selected incidents. The statutory definitions matter because they determine what enters the database and what does not.

An accident is an occurrence associated with the operation of an aircraft in which any person suffers death or serious injury, or in which the aircraft receives substantial damage. The substantial damage threshold has a specific regulatory meaning: it means damage or structural failure that adversely affects the structural strength, performance, or flight characteristics of the aircraft, and that would normally require major repair or replacement of the affected component. A gear-up landing that bends the propeller, collapses the nose gear, and scrapes the belly typically qualifies; a bird strike that dents the leading edge without structural consequence does not. The distinction matters because accidents receive full NTSB investigation records; the threshold for what counts shapes the composition of the database.

An incident is an occurrence other than an accident that affects or could affect the safety of operations. The FAA specifies certain incident types as “serious incidents” requiring immediate NTSB notification: flight control system malfunctions, inflight crew incapacitation, fires, structural failures, near-collisions requiring evasive action, and others. Serious incidents receive more thorough NTSB attention than routine incidents.

General aviation — private, recreational, and non-scheduled commercial flying under Part 91 and Part 135 — accounts for the overwhelming majority of NTSB accident records, roughly 90 percent by count and an even larger share by fatal accident count. Commercial air carrier operations under Part 121 account for a small fraction of accidents but receive the most investigative attention and public scrutiny. Scheduled US carriers recorded zero passenger fatalities in most years between 2009 and 2024, a safety record without precedent in the history of US commercial aviation.

Accident taxonomy: probable cause and phase of flight

Every NTSB accident investigation produces two outputs of analytical value: a probable cause determination and a phase-of-flight coding.

The probable cause is NTSB's formal finding — a narrative supported by the investigative record identifying the factor or factors that caused the accident. It is accompanied by contributing factors, which document conditions that influenced the outcome without being its primary cause. For general aviation, the dominant probable cause categories are:

  • Pilot error — accounts for more than 70 percent of general aviation accident probable causes. Subcategories include failure to maintain aircraft control, improper decision making, failure to follow procedures, spatial disorientation, and VFR flight continued into instrument meteorological conditions (IMC). This last scenario — VFR-into-IMC — carries a fatality rate exceeding 90 percent and is the single deadliest accident pattern in general aviation.
  • Mechanical failure — accounts for 15 to 20 percent of accidents. Engine failures, both power loss for undetermined reasons and fuel exhaustion or starvation (a pilot error variant), dominate this category. Fuel exhaustion — running the tanks dry — is age-agnostic and appears in aircraft of all vintages, which is itself a finding about the limits of maintenance as a safety lever.
  • Weather — figures as a contributing factor more often than as a standalone probable cause, because most weather-related accidents also involve a pilot decision to depart into or continue into deteriorating conditions, which is itself a pilot error finding.
  • Controlled flight into terrain (CFIT) — occurs when an airworthy aircraft under crew control is flown into terrain, water, or obstacles. CFIT is the leading killer in commercial aviation globally and remains a significant category in domestic general aviation, particularly at night and in mountainous terrain.
  • Loss of control in flight (LOC-I) — the aerodynamic loss of controlled flight, typically through stall/spin, upset, or exceeding the aircraft's structural limits. LOC-I accounts for a disproportionate share of fatal general aviation accidents relative to its frequency.

Phase of flight is coded on a standardized taxonomy: standing, taxi, takeoff, initial climb, en route, maneuvering, approach, landing, and other. The distribution of accidents across phases is not proportional to the time spent in each phase. Takeoff, initial climb, approach, and landing are overrepresented relative to the brief time they consume in a typical flight. Maneuvering — low-altitude aerobatics, terrain following, and buzzing — produces a disproportionate share of fatal accidents for its relatively infrequent occurrence. En route cruise is underrepresented: it is the longest phase but produces fewer accidents per flight hour than the transitions around airports.

The Boeing 737 MAX crisis and the FAA oversight data trail

The Boeing 737 MAX investigation illustrates how federal aviation safety data systems interact in a major investigation. Lion Air Flight JT610 crashed into the Java Sea on October 29, 2018, killing all 189 people aboard. Ethiopian Airlines Flight ET302 crashed near Addis Ababa on March 10, 2019, killing all 157. Both aircraft were 737 MAX 8s; both crashes occurred in similar circumstances shortly after takeoff; both involved the MCAS — Maneuvering Characteristics Augmentation System — a software system that repeatedly pushed the nose down in response to faulty angle-of-attack sensor readings. The combined death toll was 346.

Because the crashes occurred outside the United States, the NTSB participated as an accredited representative rather than as lead investigator. The Indonesian National Transportation Safety Committee led the JT610 investigation; the Ethiopian Accident Investigation Bureau led the ET302 investigation. NTSB provided substantial technical support and published its own safety recommendations. The flight data recorders and cockpit voice recorders from both aircraft were recovered and analyzed; the NTSB published the FDR data alongside its safety recommendations, giving the public an unprecedented view into the final minutes of both flights.

The FAA's own investigation, combined with the House Transportation Committee investigation, produced a documentary record that exposed the certification process failures enabling MCAS. Internal FAA communications and Boeing's certification submissions became part of the public record through congressional disclosure. The FAA's Aircraft Certification Office had delegated significant portions of the 737 MAX certification to Boeing itself under the Organization Designation Authorization program — a delegation framework designed to reduce FAA workload that the House investigation characterized as regulatory capture. The 737 MAX was grounded worldwide for 20 months, from March 2019 to November 2020 in the United States, the longest grounding of a US-certificated aircraft in history. The return-to-service process generated hundreds of pages of FAA correspondence and certification documentation now available through the docket.

Aviation Safety Reporting System: the NASA-administered confidential database

The Aviation Safety Reporting System occupies a unique position in the federal safety data ecosystem. Unlike the NTSB database, which records accidents that have already happened, or the FAA AIDS system, which captures incidents with mandatory reporting triggers, ASRS is designed to capture the near-misses, procedural deviations, and systemic safety concerns that reporters would conceal if the information could be used against them.

The confidentiality mechanism is structural. NASA, not the FAA, operates the system under a Memorandum of Agreement. NASA receives reports, strips all identifying information, and transmits de-identified data to the FAA. The FAA agreed, as a condition of the program, not to use ASRS data in enforcement actions against reporters who submit within the ten-day window. The FAA's own enforcement manual codifies this waiver. An air traffic controller who makes a separation error and files an ASRS report within ten days cannot have that specific error used as the basis for FAA enforcement action.

The result is a database of more than one million reports describing the safety environment from the inside: runway incursions that nearly resulted in collisions but were resolved without contact, clearance readback errors that went uncorrected until another party intervened, fatigue-induced procedural deviations by flight crew, and maintenance practices that deviated from the approved manual but did not produce detectable outcomes. ASRS researchers have used this data to identify precursor patterns to accidents — categories of near-miss that appear in the voluntary reports before they materialize as accident statistics in the NTSB database.

The full ASRS database is publicly searchable at asrs.arc.nasa.gov, with report text available for all de-identified records. Bulk database downloads are available for research purposes. The database supports full-text search across the report narratives, making it possible to isolate reports describing specific scenarios — runway incursions at a particular airport, TCAS resolution advisories in specific airspace, or fatigue-related events on specific route structures.

Runway incursions and ground safety programs

A runway incursion is any occurrence at an airport involving the incorrect presence of an aircraft, vehicle, or person on the protected area of a surface designated for the landing and takeoff of aircraft. The FAA categorizes runway incursions by severity on a four-level scale:

  • Category A — a serious incident in which a collision was narrowly avoided. Separation was insufficient and only immediate action by flight crew or controller prevented a collision. Category A incursions are relatively rare but receive full investigation.
  • Category B — a significant incident in which separation decreases and there is a significant potential for collision, which may result in a time-critical corrective or evasive response.
  • Category C — an incident characterized by ample time and distance to avoid a collision.
  • Category D — an incident that meets the definition of a runway incursion such as incorrect presence of a single vehicle, person, or aircraft, but with no immediate safety consequence.

A significant increase in Category A and B runway incursions during 2022 and 2023 led the FAA to convene a nationwide safety review with airlines, airports, and pilot unions. High-profile incidents at Austin-Bergstrom International, JFK, and other major airports entered public attention and congressional scrutiny. The FAA responded with enhanced controller training, additional surface detection technology deployment, and an expanded focus on Runway Status Lights (RWSL) systems.

Two voluntary safety programs complement the mandatory reporting framework at the airline level. Aviation Safety Action Programs (ASAPs) are airline-specific programs where employees report safety events to an Event Review Committee that includes FAA representation, without fear of enforcement action for good-faith reports. Flight Operational Quality Assurance (FOQA) programs aggregate de-identified digital flight data from every flight and analyze it for trend exceedances — unstabilized approaches, hard landings, airspeed deviations — at the fleet level. Airlines share FOQA data with the FAA under agreements that protect the de-identified data from enforcement use. Both programs produce safety trend intelligence invisible in any publicly available dataset.

Wildlife strikes: the Miracle on Hudson dataset

On January 15, 2009, US Airways Flight 1549 departed LaGuardia Airport and struck a flock of Canada Geese at approximately 2,800 feet altitude, 95 seconds after takeoff. Both engines ingested birds and lost thrust. Captain Chesley Sullenberger and First Officer Jeffrey Skiles ditched the aircraft in the Hudson River; all 155 persons aboard survived. The strike was caused by a double engine ingestion event from a flock of large migratory birds at an altitude and airspeed combination that gave the crew no viable option for return to LaGuardia or diversion to Teterboro. The incident produced an NTSB investigation, a congressional hearing, and substantial changes to FAA wildlife hazard management policy at airports near bird migration corridors.

The FAA Wildlife Strike Database records approximately 17,000 strikes per year. Fields include aircraft type and airline, airport of occurrence, strike height above ground, phase of flight, species or species group, aircraft damage assessment (none, minor, substantial, destroyed), number of birds struck, whether birds were ingested into engines, and human injuries if any. The Smithsonian Institution's Feather Identification Laboratory assists in identifying species from remains or feathers recovered from aircraft engines or airframes, providing the species-level data that informs wildlife hazard management decisions at specific airports.

Engines are the primary damage concern: bird ingestion into a turbofan engine can cause compressor blade damage, combustion chamber contamination, or full power loss. Windshield strikes from large birds at high speed can penetrate the windshield and incapacitate crew. The database enables analysis of which bird species are most hazardous by size and frequency, which airports face the highest strike risk, and which aircraft types are most vulnerable. The FAA uses this data to set airworthiness standards for engine bird-ingestion testing and to require wildlife hazard management plans at certificated airports near high-density wildlife areas.

Airspace operations and BTS on-time performance data

Beyond accident and incident data, the federal aviation data ecosystem includes extensive operational performance datasets that provide context for safety analysis.

The FAA's Aviation System Performance Metrics (ASPM) database tracks delay, throughput, and efficiency for major US airports and the national airspace system as a whole. ASPM captures scheduled versus actual gate departure and arrival times, airborne times, taxi times, and National Airspace System delay causes for instrument flight rule operations at approximately 80 major airports. The data is available through the FAA's Performance Data Analysis and Reporting System (PDARS).

The Bureau of Transportation Statistics publishes on-time performance data for approximately 14 major domestic carriers under reporting thresholds set by the DOT. The BTS data uses OOOI timestamps — Out of gate, Off ground, On ground, Into gate — to track the four key events in a commercial flight's timeline. Delay cause codes distinguish carrier-caused delays, National Airspace System delays, weather delays, security delays, and late-aircraft delays (where the inbound aircraft arrived late, cascading the delay to the next flight). This data is publicly available through the BTS TranStats portal and is the basis for the DOT's monthly Air Travel Consumer Report.

FAA Civil Aviation Registry: the N-number aircraft database

Every civil aircraft registered in the United States receives an N-number — the alphanumeric tail number beginning with the letter N that identifies the aircraft in all regulatory contexts. The FAA Civil Aviation Registry records every such registration. The registry currently contains more than 200,000 active aircraft registrations and is freely downloadable in bulk CSV format from faa.gov/licenses_certificates/aircraft_certification/aircraft_registry.

Registry fields include the N-number, the aircraft serial number (the manufacturer's unique identifier), manufacturer name, aircraft model, year manufactured, registrant name and address, airworthiness certificate type, and certificate issue date. Airworthiness certificate types include Standard (factory-built, type-certificated aircraft), Experimental (amateur-built, kit-built, and research aircraft), Limited (agricultural and special-purpose aircraft), and Primary (simplified category for simple aircraft). The Experimental category is analytically significant: experimental aircraft operate under less restrictive maintenance requirements than Standard certificate aircraft, and their accident rate per flight hour is substantially higher.

The N-number is the join key between the FAA registry and the NTSB accident database. Every NTSB accident record that involves a US-registered aircraft includes the N-number, enabling a join to registry data that adds aircraft age, engine type, and registered owner without those fields needing to be collected at the accident scene. The fleet-age distribution visible in this join is striking: the median age of active general aviation aircraft in the US exceeds 40 years. Most of the Cessna 172s, Piper Cherokee variants, and Beechcraft Bonanzas in active service were manufactured in the 1960s through 1980s.

Pilot and airmen certification data

The FAA publishes data on all airmen certificates — pilots, mechanics, air traffic controllers, parachute riggers, and dispatchers — through the FAA Airmen Inquiry system and as bulk downloads from the FAA data portal. Approximately 700,000 active pilot certificates are on file, representing the universe of certificated pilots in the United States.

The certificate type hierarchy for pilots runs from Student (minimum 20 hours solo for fixed-wing) through Recreational, Sport, Private (minimum 40 hours), Commercial (minimum 250 hours), and Airline Transport Pilot (minimum 1,500 hours since the 2013 rule change following Colgan Air Flight 3407). Each certificate type carries different privilege and limitation sets: a Private certificate holder may carry passengers but cannot be compensated; a Commercial certificate allows compensation for certain operations; an ATP is required to serve as pilot-in-command for Part 121 air carrier operations.

The demographic picture in the airmen data is distinctive and has generated substantial policy discussion in the context of the pilot shortage. The pilot workforce skews male: female pilots constitute approximately 6 percent of all certificated pilots, a proportion that has changed slowly over decades despite sustained industry and FAA attention to pilot diversity programs. Racial diversity in the pilot workforce is similarly low. The median age of active pilots is approximately 45, reflecting both the lengthy training pipeline required to reach commercial certificates and the declining entry rates into flight training over the past two decades. The mandatory ATP retirement age of 65 for Part 121 first officers and captains creates a predictable attrition curve that the industry projects will produce significant pilot shortfalls in the late 2020s and 2030s.

Mechanic certification data follows a similar structure. Airframe and Powerplant (A&P) mechanics must hold separate certificates for airframe work and powerplant work, both issued by the FAA after written, oral, and practical tests. Inspection Authorization (IA) holders are A&P mechanics authorized to perform and sign off annual inspections on certificated aircraft. The IA workforce is also aging, and the geographic distribution of IA holders — concentrated in areas with large general aviation populations — affects maintenance access for pilots at rural and remote airports.

Python: NTSB accident analysis by state and phase of flight

The following script downloads the NTSB aviation accident database CSV, filters to fixed-wing general aviation accidents in the past five years, groups records by state and phase of flight, computes the fatal accident rate by phase, and identifies the three phases with the highest fatality probability. It also summarizes probable cause categories.

import requests
import pandas as pd
import io
import zipfile
from datetime import date

# NTSB Aviation Accident Database bulk download
# Available at: https://www.ntsb.gov/Pages/AviationDownloadData.aspx
# Download the main accident CSV (AviationData.zip or similar)

NTSB_BULK_URL = "https://data.ntsb.gov/avdata/FileDirectory/DownloadFile?fileID=C%3A%5Cavdata%5Cavall.zip"

resp = requests.get(NTSB_BULK_URL, timeout=300)
resp.raise_for_status()

with zipfile.ZipFile(io.BytesIO(resp.content)) as zf:
    names = zf.namelist()
    # Locate the primary accident CSV
    acc_name = next(
        (n for n in names if "avall" in n.lower() and n.endswith(".csv")),
        names[0],
    )
    with zf.open(acc_name) as f:
        df = pd.read_csv(f, encoding="latin-1", low_memory=False)

# Normalize column names (NTSB uses mixed case)
df.columns = [c.strip().upper().replace(" ", "_") for c in df.columns]

# Parse event date
df["EVENT_DATE"] = pd.to_datetime(df["EVENT_DATE"], errors="coerce")

# Filter: fixed-wing general aviation, past 5 years
cutoff = pd.Timestamp(date.today().year - 5, 1, 1)
mask = (
    (df["EVENT_DATE"] >= cutoff)
    & (df["AIRCRAFT_CATEGORY"].str.upper().str.strip() == "AIR")  # airplane
    & (df["AMATEUR_BUILT"].str.upper().str.strip().isin(["NO", "N", ""]))
    # Exclude commercial Part 121 / scheduled service
    & (~df["AIR_CARRIER"].str.upper().str.strip().isin(["PART 121", "AIR CARRIER"]))
)
ga = df[mask].copy()

print(f"General aviation fixed-wing accidents in dataset: {len(ga)}")

# Standardize phase-of-flight labels
phase_col = "BROAD_PHASE_OF_FLIGHT" if "BROAD_PHASE_OF_FLIGHT" in ga.columns else "PHASE_OF_FLIGHT"
ga["PHASE"] = ga[phase_col].str.strip().str.upper().fillna("UNKNOWN")

# Fatal flag: any recorded fatal injuries
ga["TOTAL_FATAL"] = pd.to_numeric(ga.get("TOTAL_FATAL_INJURIES", ga.get("FATAL_INJURIES", 0)), errors="coerce").fillna(0)
ga["IS_FATAL"] = ga["TOTAL_FATAL"] > 0

# --- Analysis 1: accidents and fatalities by phase of flight ---
phase_stats = (
    ga.groupby("PHASE")
    .agg(
        accidents=("IS_FATAL", "count"),
        fatal_accidents=("IS_FATAL", "sum"),
        total_fatalities=("TOTAL_FATAL", "sum"),
    )
    .reset_index()
)
phase_stats["fatality_rate"] = (
    phase_stats["fatal_accidents"] / phase_stats["accidents"]
)
phase_stats = phase_stats.sort_values("fatality_rate", ascending=False)

print("
Phase of flight -- fatal accident rate (descending):")
print(phase_stats[["PHASE", "accidents", "fatal_accidents", "fatality_rate"]].to_string(index=False))

# Top 3 phases by fatality probability
top3 = phase_stats.head(3)
print("
Three phases with highest fatal-accident probability:")
for _, row in top3.iterrows():
    pct = row["fatality_rate"] * 100
    print(f"  {row['PHASE']}: {pct:.1f}% of accidents were fatal ({int(row['fatal_accidents'])}/{int(row['accidents'])})")

# --- Analysis 2: accidents by state ---
state_counts = (
    ga.groupby("STATE")
    .agg(accidents=("IS_FATAL", "count"), fatalities=("TOTAL_FATAL", "sum"))
    .reset_index()
    .sort_values("accidents", ascending=False)
)
print("
Top 10 states by GA accident count (last 5 years):")
print(state_counts.head(10).to_string(index=False))

# --- Analysis 3: probable cause category summary ---
cause_col = next(
    (c for c in ga.columns if "CAUSE" in c and "PROBABLE" in c),
    None,
)
if cause_col:
    cause_summary = ga[cause_col].str[:60].value_counts().head(15)
    print("
Top probable cause categories (first 60 chars):")
    print(cause_summary.to_string())

Implementation notes. The NTSB bulk file naming has varied across database releases; the script searches the ZIP contents by substring match rather than a hardcoded filename. The phase-of-flight column name also varies — earlier releases use PHASE_OF_FLIGHT while more recent ones use BROAD_PHASE_OF_FLIGHT as a consolidated field; the script tests for both. Filtering out commercial air carrier operations using the AIR_CARRIER field is approximate — a cleaner approach joins against the FAA operator certificate database, but the column filter handles the majority of scheduled carrier records. The probable cause analysis uses the first 60 characters of the narrative field because full narratives are long and vary widely; production analysis would use keyword or regex matching against the coded cause/factor fields in the related Findings table.

Related writing

The closest surface-transportation analog to the NTSB accident database: NHTSA's Fatality Analysis Reporting System is a complete census of every US traffic death since 1975, with vehicle, crash, and person records that reward the same relational join strategies the NTSB data requires — NHTSA FARS: The Federal Traffic Fatality Census Behind Every Road Safety Analysis →

For border and port operations data from the Bureau of Transportation Statistics — monthly counts of every vehicle, truck, and pedestrian crossing at US land ports of entry going back to 1996 — BTS Border Crossing Entry Data: Monthly Counts of Every Vehicle, Truck, and Pedestrian at US Land Ports →

Commercial trucking safety data from the Federal Motor Carrier Safety Administration: SAFER and MCMIS cover 550,000+ interstate carriers with safety ratings, out-of-service rates, and SMS BASIC percentiles — FMCSA Carrier Safety Ratings: The Federal Database Behind 550,000 Trucking Companies →