Technical writing
NTSB Aviation Accident Database: The Federal Record Behind Every US Aircraft Accident Investigation
The National Transportation Safety Board maintains the most comprehensive public database of civil aviation accidents in the United States — approximately 400,000 accident and incident records covering every civil aviation accident since 1962, each with investigation findings, probable causes, contributing factors, and safety recommendations that have reshaped how aircraft are designed, certified, and operated.
What the NTSB accident database contains
The NTSB Aviation Accident Database — formally the Aviation Crash Online Retrieval system, known by its acronym CAROL — is the definitive federal record of civil aviation accidents and incidents occurring within the United States and involving US-registered aircraft abroad. The database is searchable at data.ntsb.gov and available for bulk download as either a Microsoft Access database (MDB) or CSV export. It is updated continuously as investigations close and probable cause determinations are issued.
Each record in the database captures the accident or incident across multiple dimensions. Event identification fields include the NTSB event number (a unique identifier structured as state abbreviation, year, and sequential number, for example WPR23LA042), the event date and time, and the geographic location down to airport identifier or nearest city and county. Aircraft fields capture the registration N-number, manufacturer and model, year of manufacture, amateur-built flag, number of engines, and engine type. Operator fields identify the operating certificate type and the name of the operator.
Injury fields record the total number of fatalities, serious injuries, minor injuries, and uninjured persons on board. Aircraft damage is coded as Destroyed, Substantial, Minor, or None. The investigation type field distinguishes full Accident investigations from the lower-tier Incident investigations. The phase of flight at which the event occurred is coded using the NTSB broad phase taxonomy: Takeoff, Climb, Cruise, Descent, Approach, Landing, Standing, Taxi, Maneuvering, and Other. Weather conditions are coded as VMC (visual meteorological conditions), IMC (instrument meteorological conditions), or Unknown.
The most consequential field in the database is the probable cause narrative — a formal written determination issued by the NTSB Board after the investigation concludes. Probable cause narratives state the safety deficiencies that the Board determined were causally related to the accident. Contributing factors are listed separately as elements that increased the probability or severity of the accident without rising to the level of a direct cause. Not all records have probable cause narratives: ongoing investigations are flagged as “Probable Cause Pending” until the Board issues its final determination, which can take months to years for major accidents and years for complex systemic investigations.
The database also stores links to associated documents: preliminary reports issued within days of a major accident, factual reports compiling physical and operational evidence, docket files containing all investigation exhibits and witness statements, and the Final Report or Aircraft Accident Report for major investigations reaching formal Board-level review. These documents are hosted on the NTSB website and linked from each accident record.
Accident vs. incident definitions
The NTSB operates under precise statutory definitions for accident and incident established in 49 CFR Part 830, the NTSB's regulations on notification and reporting requirements. The definitions determine which events require immediate notification to the NTSB and what level of investigation follows.
An aviation accident is an occurrence associated with the operation of an aircraft which takes place between the time any person boards the aircraft with the intention of flight and all such persons have disembarked, in which any person suffers death or serious injury, or in which the aircraft receives substantial damage. All three paths to the accident threshold — fatality, serious injury, or substantial aircraft damage — trigger the full notification and investigation requirements.
Serious injury is defined by 49 CFR 830.2 to include: hospitalization for more than 48 hours commencing within 7 days from the date of the injury; fracture of any bone except simple fractures of fingers, toes, or nose; severe hemorrhages, nerve, muscle, or tendon damage; internal organ damage; second or third degree burns; any burns affecting more than 5 percent of the body surface; or exposure to infectious substances or toxic radiation.
Substantial damage means damage or failure that adversely affects the structural strength, performance, or flight characteristics of the aircraft, and that would normally require major repair or replacement of the affected component. The regulation lists exceptions: engine failure or damage limited to a single engine (including its cowling and accessories) on a multi-engine aircraft; bent fairings or cowling; dented skin; small puncture holes in the skin or fabric; ground damage to rotor or propeller blades; and damage to landing gear, wheels, tires, flaps, engine accessories, brakes, or wingtips. These exceptions mean that a substantial portion of general aviation accidents involving aircraft that can be repaired and returned to service do not meet the substantial damage threshold.
An aviation incident is any occurrence other than an accident that affects or could affect the safety of operations. Incidents do not meet the fatality, serious injury, or substantial damage thresholds but still represent safety-significant events. Examples include: a runway incursion where a collision was narrowly avoided; an engine shutdown in flight on a multi-engine aircraft; a loss of pressurization; a fuel exhaustion event that resulted in an emergency landing without injury or substantial damage; or an airspace violation that created collision risk. Certain categories of incidents require immediate NTSB notification, including flight control system malfunctions on air carrier aircraft, in-flight fires, and near mid-air collisions.
The investigation pipeline
Every aviation accident or qualifying incident triggers a defined investigation pipeline. The speed and depth of investigation depend on the severity of the event, the type of operation involved, and the NTSB's assessment of systemic safety implications.
Notification
49 CFR Part 830 requires immediate notification to the NTSB by the operator or pilot-in-command for any accident and for specified categories of serious incidents. Notification is made to the NTSB 24-hour Operations Center at (844) 373-9922. The FAA duty officer is simultaneously notified and passes the report to the NTSB. Aircraft operators subject to Part 121 commercial air carrier rules are required to maintain notification systems that ensure immediate reporting without reliance on any individual pilot or employee.
Go-team deployment
For major accidents — those with multiple fatalities, significant public interest, or apparent systemic causes — the NTSB deploys a go-team to the accident site, sometimes within hours of the event. The go-team is led by an Investigator-in-Charge (IIC) and includes specialists organized into investigative groups covering operations, airworthiness, structures, systems, powerplants, survival factors, weather, air traffic control, and human performance. The FAA is a party to every NTSB investigation by statute; for accidents involving air carrier aircraft, the manufacturer, airline, pilots union, and other organizations with operational knowledge are typically granted party status, which gives them access to investigation activities in exchange for technical assistance.
The party system is a distinctive feature of NTSB investigations that has been both praised for the technical depth it enables and criticized for giving regulated entities access to evidence collection. Boeing, Airbus, GE Aviation, Pratt and Whitney, and major airlines regularly serve as parties in accidents involving their products. Party status gives entities access to wreckage examination, witness interviews, and factual report review — a dynamic that critics argue can produce organizational pressure on the investigation process even when the NTSB maintains formal independence.
Factual report
The factual report compiles the physical and documentary evidence without reaching conclusions about cause. It typically includes a flight history narrative, witness statements, cockpit voice recorder and flight data recorder readouts when available, meteorological information, air traffic control transcripts, medical examiner reports, maintenance records, and engineering analyses of structural components and systems. Factual reports for major accidents run to thousands of pages and are posted to the NTSB public docket at data.ntsb.gov.
Probable cause report
The probable cause report — for most general aviation accidents, a brief final determination issued by the IIC without full Board review — states the causal chain and contributing factors. The language is formulaic: the NTSB has determined the probable cause of this accident to be the specific failure or decision in question, with contributing factors listed separately. The cause statement is the legally and operationally significant output of the investigation: it is the record that FAA enforcement actions, civil litigation, and safety recommendations reference, and it is the data field that enables aggregate statistical analysis of accident causes across the full database.
Board Meeting and Aircraft Accident Report
The most significant investigations — those involving air carrier accidents, novel safety issues, or events with broad systemic implications — proceed to a formal Board Meeting at NTSB headquarters in Washington, D.C. The Board Meeting is a public proceeding at which staff present their findings, Board members deliberate on the probable cause determination and safety recommendations, and representatives of the parties may submit written comments. The resulting Aircraft Accident Report is a comprehensive document that often runs to several hundred pages and includes all five Board members' votes on the probable cause determination. Safety recommendations issued in connection with the Aircraft Accident Report are tracked in the NTSB safety recommendations database, where the FAA, aircraft manufacturers, and airlines must formally respond.
Probable cause and contributing factors
The probable cause taxonomy used in the NTSB database reflects decades of accident investigation experience. The NTSB does not use a fixed code list for probable causes; the determination is written in plain language by the IIC or the Board. However, the CAROL database includes coded fields for findings derived from the narrative, organized into categories that enable systematic analysis across the full accident record.
The coded cause and finding fields in the database use a three-level hierarchy: Category (broad area such as Personnel, Aircraft, Environment, Organizational), Subcategory (more specific grouping such as Pilot, Maintenance, Weather, Air Traffic Control), and Finding (the specific deficiency, such as “failure to maintain aircraft control,” “improper preflight planning,” or “spatial disorientation”). Each accident may have multiple findings, and each finding is coded as a Cause, Factor, or Not a Factor.
In general aviation accidents, pilot-related findings dominate the probable cause database. Analysis of NTSB data across recent decades consistently shows pilot error as a finding in approximately 75 to 80 percent of general aviation fatal accidents. The most common specific findings include failure to maintain aircraft control, failure to follow procedures, improper fuel management, spatial disorientation, continued VFR flight into IMC — the leading cause of non-commercial fatal accidents by wide margin — and failure to perform adequate preflight inspection.
Mechanical and systems findings appear as probable cause or contributing factor in approximately 15 to 20 percent of general aviation accidents. Engine failures due to fuel exhaustion, carburetor icing, improper maintenance, and component fatigue are the most frequently cited mechanical causes. Airframe structural failures in flight — which typically result from overloading the airframe during aerobatic maneuvers or maneuvering in turbulence at excessive speeds — appear in a smaller but analytically significant subset.
In commercial aviation accidents, the probable cause profile shifts significantly. Crew resource management (CRM) failures, automation-related errors, and organizational deficiencies are more prominent than in general aviation investigations. The NTSB's investigation of Colgan Air Flight 3407 in 2009 produced a probable cause finding that explicitly identified the captain's inappropriate response to an impending stall and the first officer's inadvertent activation of the flight director as direct causes, while citing systemic issues with the airline's crew training program and fatigue management as contributing factors — a framing that drove the legislative response embodied in the Airline Safety and Federal Aviation Administration Extension Act of 2010.
Major accident investigations
Colgan Air Flight 3407 (2009)
Colgan Air Flight 3407, a Bombardier Dash-8 Q400 operating as Continental Connection, crashed on approach to Buffalo Niagara International Airport on February 12, 2009, killing all 49 persons aboard and one person on the ground. The NTSB determined the probable cause was the captain's inappropriate response to the stick shaker stall warning — he pulled back on the controls rather than pushing forward as required by stall recovery procedure — and the first officer's inadvertent retraction of the flaps at a critical moment. Contributing factors included the flight crew's failure to monitor airspeed references, the captain's failure to adhere to sterile cockpit procedures, and the airline's inadequate oversight of crew training.
The Colgan Air investigation is considered the most consequential NTSB investigation of the past two decades in terms of regulatory response. The Airline Safety and FAA Extension Act of 2010 that followed required all Part 121 first officers to hold Airline Transport Pilot certificates, raising the minimum from 250 to 1,500 flight hours; mandated new fatigue rules for airline crews implemented as the FAA's Part 117 rest rules; and required airlines to disclose the full training and employment history of pilots to prospective employers through an FAA-maintained database.
Southwest Airlines Flight 1380 (2018)
Southwest Airlines Flight 1380, a Boeing 737-700, suffered an uncontained engine failure on April 17, 2018, when a fan blade separated from the CFM56-7B engine due to metal fatigue. Debris struck the fuselage, breaking a window; a passenger was partially ejected and died of blunt impact trauma, becoming the first commercial aviation passenger fatality in the United States in more than nine years. Seven others sustained serious injuries. The NTSB probable cause cited the CFM International engine's lack of procedures to ensure consistent inspection of fan blade surfaces to detect fatigue cracking, and the FAA's failure to ensure that those procedures were in place. The investigation led to an FAA airworthiness directive requiring ultrasonic inspection of all CFM56-7B fan blades above a specified number of flight cycles.
Lion Air Flight 610 and Ethiopian Airlines Flight 302 (2018–2019)
The Boeing 737 MAX accidents represent the most consequential aviation safety crisis of the 2010s. Lion Air Flight 610 crashed into the Java Sea on October 29, 2018, killing all 189 aboard; Ethiopian Airlines Flight 302 crashed on March 10, 2019, killing all 157 aboard. Both crashes were caused by the Maneuvering Characteristics Augmentation System (MCAS), a flight control law introduced on the 737 MAX to compensate for the repositioned, larger engines changing the pitch characteristics of the airframe. MCAS activated in both cases based on erroneous angle-of-attack sensor inputs and pushed the nose down repeatedly; flight crews were unable to override the system with the available control inputs. The Boeing 737 MAX was grounded worldwide in March 2019 — the longest grounding of a commercial aircraft type in history — and returned to service in November 2020 after extensive software and documentation changes mandated by the FAA.
The NTSB served as an accredited representative in the Ethiopian and Indonesian investigations, which were led by the respective national accident investigation authorities under the Chicago Convention framework. The NTSB separately investigated the FAA's certification process for the 737 MAX and issued a safety study in 2022 that documented deficiencies in FAA oversight of Boeing's certification activities, the organizational transfer of MCAS safety analysis away from the Boeing safety assessment team, and the assumptions embedded in the original MCAS design that did not account for the failure mode that occurred in both accidents.
Alaska Airlines Flight 1282 — 737 MAX 9 door plug (January 2024)
On January 5, 2024, an Alaska Airlines Boeing 737 MAX 9 operating as Flight 1282 suffered a door plug blowout at approximately 16,000 feet altitude shortly after departure from Portland International Airport. The mid-cabin door plug — a panel installed over an unused door cutout on the fuselage — separated from the aircraft, creating a roughly 30-by-20-inch opening. One passenger was partially ejected; several others sustained serious injuries. The aircraft landed safely at Portland. No fatalities occurred because the adjacent seats were unoccupied; NTSB investigators noted that occupied seats would likely have resulted in passenger ejection and death.
The NTSB investigation determined that four bolts securing the door plug were missing at the time of departure. Documentary evidence showed the bolts had been removed during a repair operation at Spirit AeroSystems' Wichita, Kansas facility to allow inspection of rivets on the door frame, and had not been reinstalled before the aircraft was delivered to Boeing's final assembly line in Renton, Washington. Boeing's quality control process did not detect the missing hardware. The investigation revealed systemic quality management failures at both Spirit AeroSystems and Boeing and triggered a broader FAA audit of Boeing's quality systems, coming while Boeing was already under intensive FAA oversight following the 737 MAX groundings.
Midair collision near Albuquerque (2023)
On June 16, 2023, a Cessna 182 and a Cessna 152 collided in midair near Albuquerque, New Mexico, killing all five persons aboard both aircraft. The accident occurred in uncontrolled airspace at low altitude during training operations from a local flight school. The NTSB investigation identified failure of both flight crews to see and avoid each other as the probable cause, consistent with the see-and-avoid responsibility that governs visual flight rules operations in uncontrolled airspace. Contributing factors included sun angle and visual background conditions that reduced the conspicuity of the other aircraft in the respective pilots' fields of view. The accident renewed discussion of automatic dependent surveillance-broadcast (ADS-B) requirements for training aircraft, which are currently exempt from ADS-B Out mandates below certain altitudes in uncontrolled airspace.
Helicopter EMS accidents
Helicopter emergency medical service (HEMS) operations occupy a disproportionate share of NTSB helicopter accident investigations relative to their share of total helicopter operations. HEMS operations combine factors that individually elevate accident risk: night operations, response in adverse weather, remote landing zones with uncharted obstacles, and time pressure that may compress pre-flight risk assessment. The NTSB has issued multiple safety studies specifically on HEMS accidents and has made repeated safety recommendations regarding terrain awareness and warning systems (TAWS), night vision goggle requirements, and IFR certification requirements for HEMS operations.
Between 2010 and 2023, HEMS operations accounted for approximately 50 to 60 fatal accidents, killing over 100 flight crew and medical personnel. The NTSB's Most Wanted List of aviation safety improvements has consistently included HEMS operational improvements, and FAA rulemaking following NTSB recommendations has required TAWS on HEMS aircraft, pilot certification upgrades, and company-level risk assessment programs. Despite these improvements, HEMS accident rates remain elevated relative to other certificated helicopter operations.
Commercial vs. general aviation patterns
The NTSB accident database reveals a stark division between commercial and general aviation safety performance. This division is not marginal — it is one of the most pronounced disparities in any safety-regulated industry, spanning roughly two orders of magnitude in fatal accident rate per flight hour.
| Metric | Part 121 (Airlines) | Part 91 (General Aviation) |
|---|---|---|
| Annual accidents (recent average) | ~30–50 | ~1,100–1,350 |
| Annual fatal accidents | 0–5 | ~200–250 |
| Annual fatalities | 0–10 (typical years) | ~300–400 |
| Fatal accident rate per 100,000 flight hours | ~0.005–0.01 | ~0.8–1.2 |
| Cockpit voice recorder required | Yes (Part 91.609) | No |
| Flight data recorder required | Yes (Part 121.344) | No |
| ATP certificate required for PIC | Yes (1,500 hours min.) | No (PPL + ratings) |
| IFR operations required | Yes | No |
General aviation — operations conducted under Part 91 of the Federal Aviation Regulations, encompassing personal and recreational flying, flight training, business aviation, and aerial work outside commercial certification — accounts for over 95 percent of all NTSB accident records in the database. The approximately 1,200 to 1,400 general aviation accidents per year recorded by the NTSB in recent years represent a significant long-term decline from the 1970s peak, when general aviation accidents exceeded 4,000 per year on a smaller fleet.
The fatal accident rate in general aviation in recent years is approximately 0.8 to 1.2 fatal accidents per 100,000 flight hours, representing a steady improvement from 1.5 to 2.0 per 100,000 hours in the 1990s. The NTSB's General Aviation Safety Initiative (GASI), launched in 2017, focused on loss of control in flight (LOC-I) as the leading accident category by fatalities. LOC-I — which encompasses stalls, spins, and uncontrolled flight that typically results in ground impact at high descent rate — accounts for approximately 40 percent of general aviation fatal accidents, making it the single most important target for safety intervention.
Commercial aviation under Part 121 has achieved a safety record without parallel in transportation history. Between 2009 and 2024, Part 121 operations killed fewer than 100 passengers in the United States across all accidents combined — in an operating environment of approximately 900 million passenger enplanements per year. The last fatal crash of a US-based Part 121 carrier with passenger fatalities before the Southwest Flight 1380 engine failure in 2018 was Colgan Air Flight 3407 in 2009. The fatality rate per billion revenue passenger-miles for US Part 121 operations is functionally zero in most recent years.
Part 135 operations — on-demand air charter, air taxi, and commuter air carrier operations — occupy an intermediate safety position. Part 135 carriers operate under stricter requirements than Part 91 but less stringent ones than Part 121. HEMS operations are predominantly Part 135. The fatal accident rate for Part 135 operations is significantly higher than Part 121 but lower than general aviation Part 91, reflecting the intermediate level of operational and maintenance requirements.
Data access and downloading
The NTSB provides public access to the aviation accident database through several channels suited to different use cases.
Web search interface. The CAROL search system at data.ntsb.gov/carol-main-public/basic-search allows free-text and structured queries against the full accident and incident database. Users can filter by date range, aircraft type, state, injury severity, operator, and dozens of additional fields. Individual accident records include links to all associated documents in the NTSB public docket system. The search interface is the primary access point for journalists, attorneys, researchers, and aviation professionals looking up specific accidents or performing targeted queries.
Bulk CSV download. The NTSB publishes the full database as a downloadable CSV export accessible from the basic search page. The download includes all accidents and incidents, with one row per event and columns for the primary identification, classification, and finding fields. The CSV does not include the full probable cause narrative for all records (those are in the linked documents), but it includes coded finding fields and a brief probable cause summary field for closed investigations. The download is updated continuously.
Microsoft Access database (MDB). The NTSB also publishes the database in MDB format, which contains a richer relational structure with separate tables for events, aircraft, findings, injuries, sequences of events, and narratives. The MDB format is the most complete representation of the database schema and is preferred for analytical work requiring the full relational structure. Access to MDB requires Microsoft Access or a compatible tool such as mdbtools (Linux/macOS open-source), DBeaver, or the Python mdb-parser library. The additional tables in the MDB format enable join-based queries that the flat CSV does not support — for example, linking each aircraft in a multi-aircraft accident to its specific finding codes, or joining all findings for a given event to analyze cause-factor combinations.
CAROL API. The NTSB provides a REST API at data.ntsb.gov/carol-ripb/ that supports structured JSON queries against the accident database. The API supports filtering by date, injury severity, aircraft category, state, and other fields, and returns paginated JSON results. No API key is required. Individual accident records can be retrieved by NTSB event number, returning the full structured record including the probable cause text and links to associated documents.
NTSB Most Wanted List. The NTSB Most Wanted List is a separate but related resource identifying the transportation safety improvements the NTSB considers most critical. The aviation section of the Most Wanted List typically includes 10 to 15 action items spanning commercial and general aviation, and each item is supported by safety recommendations tracked in the NTSB recommendations database. The Most Wanted List is updated annually and is publicly accessible at ntsb.gov/safety/mwl. Open recommendations — those the FAA or other recipient has not yet acted on — are prominently flagged.
Python: analyzing the accident database
The following three scripts cover the full workflow: downloading the CSV export from the NTSB bulk download endpoint, analyzing the flat file for fatal accidents, phase-of-flight distribution, and operating rule breakdowns, and querying the CAROL API for structured JSON access to individual accident records.
Step 1: Download the CSV export
import requests, io, pandas as pd
# ---------------------------------------------------------------------------
# NTSB Aviation Accident Database
# Primary access: https://data.ntsb.gov/carol-main-public/basic-search
#
# The NTSB publishes the full accident database as a downloadable CSV (or
# Microsoft Access MDB) updated continuously. The CSV export covers all
# civil aviation accidents and incidents from 1962 to present.
#
# To export: go to the CAROL basic search interface, run an unrestricted
# query (no filters), and click the CSV download button. The URL below
# reflects the typical export endpoint; verify against the current site.
# ---------------------------------------------------------------------------
CSV_URL = (
"https://data.ntsb.gov/carol-main-public/basic-search"
"?queryId=0&criteria=0&format=csv"
)
print("Downloading NTSB Aviation Accident Database CSV export...")
resp = requests.get(CSV_URL, timeout=180)
resp.raise_for_status()
# The response is plain CSV; encoding varies by release
try:
df = pd.read_csv(io.StringIO(resp.text), low_memory=False)
except UnicodeDecodeError:
df = pd.read_csv(io.StringIO(resp.content.decode("latin-1")), low_memory=False)
print(f"Records loaded: {len(df):,}")
print(f"Columns ({len(df.columns)}): {list(df.columns)}")
Step 2: Analyze accident patterns
import pandas as pd
# Assumes df is already loaded from the download step above.
# Normalize column names for consistent access across releases.
df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]
# ---------------------------------------------------------------------------
# 1. Filter: fatal accidents only
# ---------------------------------------------------------------------------
# The 'injury_severity' column uses NTSB severity codes:
# Fatal, Serious, Minor, None
fatal = df[df["injury_severity"].str.upper().str.contains("FATAL", na=False)].copy()
print(f"Fatal accidents in database: {len(fatal):,}")
print(f"Total accident records: {len(df):,}")
# ---------------------------------------------------------------------------
# 2. Annual accident counts (all accidents and fatal only)
# ---------------------------------------------------------------------------
df["event_date"] = pd.to_datetime(df["event_date"], errors="coerce")
df["year"] = df["event_date"].dt.year
fatal["event_date"] = pd.to_datetime(fatal["event_date"], errors="coerce")
fatal["year"] = fatal["event_date"].dt.year
annual_all = df.groupby("year").size().rename("all_accidents")
annual_fatal = fatal.groupby("year").size().rename("fatal_accidents")
annual = pd.concat([annual_all, annual_fatal], axis=1).loc[2015:2023]
print("\nAnnual accident counts (2015-2023):")
print(f"{'Year':<6} {'All Accidents':>14} {'Fatal Accidents':>16}")
print("-" * 40)
for yr, row in annual.iterrows():
print(
f"{int(yr):<6} {int(row.get('all_accidents', 0)):>14,} "
f"{int(row.get('fatal_accidents', 0)):>16,}"
)
# ---------------------------------------------------------------------------
# 3. Accidents by aircraft category
# ---------------------------------------------------------------------------
cat_col = next(
(c for c in df.columns if "aircraft_category" in c or "acft_category" in c), None
)
if cat_col:
cat_counts = df[cat_col].value_counts().head(10)
print("\nAccidents by aircraft category (all years):")
for cat, cnt in cat_counts.items():
pct = cnt / len(df) * 100
print(f" {str(cat):<30} {cnt:>8,} ({pct:.1f}%)")
# ---------------------------------------------------------------------------
# 4. Phase of flight distribution
# ---------------------------------------------------------------------------
phase_col = next(
(c for c in df.columns if "broad_phase" in c or "phase_of_flt" in c), None
)
if phase_col:
phase_counts = df[phase_col].value_counts().head(12)
print("\nAccidents by phase of flight (top 12, all accidents):")
for phase, cnt in phase_counts.items():
pct = cnt / len(df) * 100
print(f" {str(phase):<30} {cnt:>8,} ({pct:.1f}%)")
# Fatal accidents by phase
if phase_col:
fatal_phase = fatal[phase_col].value_counts().head(10)
print("\nFatal accidents by phase of flight (top 10):")
for phase, cnt in fatal_phase.items():
pct = cnt / len(fatal) * 100
print(f" {str(phase):<30} {cnt:>8,} ({pct:.1f}%)")
# ---------------------------------------------------------------------------
# 5. Top probable cause codes / far part breakdown
# ---------------------------------------------------------------------------
far_col = next(
(c for c in df.columns if "far_description" in c or "far_part" in c), None
)
if far_col:
recent_fatal = fatal[fatal["year"].between(2018, 2023)]
by_far = (
recent_fatal.groupby(far_col)
.size()
.sort_values(ascending=False)
.head(8)
)
print("\nFatal accidents 2018-2023 by operating rule (FAR part):")
for rule, cnt in by_far.items():
print(f" {str(rule):<45} {cnt:>6,}")
Step 3: Query the CAROL API
import requests
# ---------------------------------------------------------------------------
# NTSB CAROL REST API
# Base URL: https://data.ntsb.gov/carol-ripb/
#
# Provides structured JSON access to accident records and supports
# filtered queries. No API key required.
# ---------------------------------------------------------------------------
BASE = "https://data.ntsb.gov/carol-ripb"
# Query: fatal airplane accidents in 2023 (first page, 20 records)
params = {
"InjurySeverity": "Fatal",
"AircraftCategory": "Airplane",
"DateFrom": "2023-01-01",
"DateThru": "2023-12-31",
"pageIndex": 1,
"pageSize": 20,
}
resp = requests.get(f"{BASE}/api/accidents", params=params, timeout=30)
resp.raise_for_status()
data = resp.json()
print(f"Total records matching query: {data.get('TotalRecords', 'N/A')}")
print(f"Records returned this page: {len(data.get('accidents', []))}")
accidents = data.get("accidents", [])
if accidents:
print("\nSample accident records:")
print(
f"{'Date':<12} {'NTSB Number':<16} {'Location':<32} {'Make/Model':<28}"
)
print("-" * 92)
for acc in accidents[:10]:
date = acc.get("EventDate", "")[:10]
ntsb_num = acc.get("NtsbNumber", "")
city = acc.get("City", "")
state = acc.get("State", "")
location = f"{city}, {state}" if city else state
make = acc.get("Make", "")
model = acc.get("Model", "")
aircraft = f"{make} {model}".strip()
print(
f"{date:<12} {ntsb_num:<16} {location:<32} {aircraft:<28}"
)
# Retrieve a single accident record with full probable cause narrative
if accidents:
sample_id = accidents[0].get("NtsbNumber")
detail_resp = requests.get(
f"{BASE}/api/accidents/{sample_id}", timeout=30
)
if detail_resp.ok:
detail = detail_resp.json()
print(f"\nFull record for {sample_id}:")
probable_cause = detail.get("ProbableCause", "Not yet issued")
print(f" Probable Cause: {probable_cause[:300]}")
print(f" Total Fatal Injuries: {detail.get('TotalFatalInjuries', 0)}")
print(f" Total Serious Injuries: {detail.get('TotalSeriousInjuries', 0)}")
print(f" Aircraft Damage: {detail.get('AircraftDamage', '')}")
print(f" Phase of Flight: {detail.get('BroadPhaseOfFlight', '')}")
print(f" Weather Condition: {detail.get('WeatherCondition', '')}")
Implementation notes. The NTSB CSV download URL structure has varied across database releases; if the URL in Step 1 returns a search interface page rather than raw CSV, use the CAROL search interface to perform a query selecting all records and export via the CSV download button. Column names in the CSV are not fully standardized across releases, which is why the analysis script uses substring-matching logic to identify phase-of-flight and operating-rule columns. The MDB format is preferable for production analytical pipelines because it provides stable table and column names across releases. The CAROL API is the most reliable programmatic interface for targeted queries on specific events or date ranges.
For the MDB format on macOS and Linux, the open-source mdbtools package allows extraction of individual tables as CSV files. A typical extraction sequence:
# Shell: list tables in the NTSB MDB file
mdb-tables AviationData.mdb
# Extract key tables to CSV
mdb-export AviationData.mdb events > events.csv
mdb-export AviationData.mdb aircraft > aircraft.csv
mdb-export AviationData.mdb findings > findings.csv
mdb-export AviationData.mdb narratives > narratives.csvThe narratives table in the MDB contains the full probable cause narrative text for each closed investigation — the most analytically valuable field for natural language processing work on aviation safety patterns. Text analysis of probable cause narratives can identify shifts in the language used to describe recurring accident types, track the adoption of safety recommendations in subsequent accident findings, and surface emerging hazard categories before they appear in formal NTSB safety studies.
Limitations and research caveats
The NTSB database is the most comprehensive public record of US civil aviation accidents, but several limitations affect its use for research and analysis.
Reporting scope. The database covers civil aviation accidents and selected incidents. Military aviation accidents are investigated by the respective armed service branch and are generally not included. Public aircraft operations — aircraft operated by the federal government, state governments, and their agencies — are also excluded from NTSB jurisdiction under most circumstances, though the NTSB may be invited to assist in specific cases.
Incident underreporting. The NTSB incident database is substantially less complete than the accident database. Many incidents that meet the regulatory definition of a reportable incident are never reported by operators, particularly in general aviation. The FAA's Accident and Incident Data System (AIDS) captures some additional events reported to FAA regional offices that did not reach the NTSB, but coverage is patchy. NASA's Aviation Safety Reporting System (ASRS) at asrs.arc.nasa.gov captures voluntarily reported incidents with confidentiality protection — providing a separate, richer window into the near-miss event space than the mandatory NTSB incident database.
Probable cause lag. Probable cause determinations for major accidents can take two to four years or longer. The Colgan Air investigation took approximately three years from the accident to the Aircraft Accident Report. During this period, records appear in the database as Probable Cause Pending, which affects analyses of recent years that count on closed investigations. Research on accident causes using the NTSB database should exclude the most recent two to three years or explicitly account for the large proportion of open investigations in recent data.
Flight hours denominator. The NTSB database does not contain flight hours data. Accident rates per flight hour — the standard metric for comparing safety across operations of different size — require joining NTSB data to external flight hours sources. For general aviation, the FAA's General Aviation and Part 135 Activity Survey (GAATA) provides fleet-level hours estimates by aircraft type and operation type. For Part 121, BTS Form 41 data provides carrier-level hours. These are aggregate surveys, not exact measurements, which introduces uncertainty in rate calculations particularly for small aircraft categories with limited survey sample sizes.
The NTSB vs. FAA distinction. A recurring source of confusion is the division of responsibility between the NTSB and the FAA. The NTSB investigates accidents and determines probable cause; it has no regulatory or enforcement authority. The FAA regulates civil aviation and enforces compliance; it does not determine probable cause for accidents. The NTSB issues safety recommendations to the FAA and to manufacturers, airlines, and other entities, but cannot compel the FAA to act on them. The FAA must respond to NTSB recommendations within 90 days but may disagree or close a recommendation without taking the recommended action. The NTSB tracks FAA responses in its recommendations database, where the status of each open recommendation is publicly visible. Recommendations that the FAA has rejected or failed to act on are labeled “Open — Unacceptable Response” — a designation the NTSB has used as a lever for congressional and public pressure on specific safety issues.
Annual trends summary. As a reference benchmark, the NTSB reported the following approximate accident totals for recent years across all civil aviation operations:
| Year | Total accidents | Fatal accidents | Fatalities |
|---|---|---|---|
| 2018 | 1,299 | 226 | 393 |
| 2019 | 1,300 | 228 | 452 |
| 2020 | 1,078 | 174 | 337 |
| 2021 | 1,225 | 193 | 383 |
| 2022 | 1,218 | 211 | 444 |
| 2023 | 1,189 | 196 | 342 |
The 2020 decline reflects COVID-19 effects on general aviation activity — dramatically reduced flight training, business aviation, and recreational flying — rather than any structural safety improvement. The rebound in 2021 and 2022 to near-historical levels as aviation activity recovered confirms that the underlying accident rate per flight hour remained relatively stable across the pandemic period. The general aviation accident count drives the total in all years; Part 121 commercial accidents contribute fewer than 50 events annually and rarely contribute fatalities in typical recent years.
Related: FAA Civil Aviation Registry: The Federal Database Behind 700,000 Pilots and 300,000 Aircraft →
Related: NHTSA FARS: The Federal Database Behind Every US Traffic Fatality Since 1975 →
Related: PHMSA Pipeline Safety Data: The Federal Database Behind Gas and Liquid Pipeline Incidents →