Technical writing

Repetitive loss: what FEMA's flood insurance claims data reveals about 2.7 million paid claims

· 8 min read· AI Analytics
Regulatory dataFEMANFIPClimate riskInsurance

The National Flood Insurance Program has paid out claims on 2.7 million flood events since its records begin. It is chronically insolvent. A subset of properties in its database have been paid out more than a dozen times—receiving cumulative payments that exceed their assessed value—while Congress has repeatedly blocked the reforms needed to stop it. The OpenFEMA datasets documenting all of this are public, ZIP-code level, and updated monthly.

The program

Congress created the National Flood Insurance Program in 1968 after private insurers largely abandoned flood coverage following a series of catastrophic losses. The logic was straightforward: flood risk is correlated and systemic in a way that makes it uninsurable at scale by private carriers, so the federal government would step in as insurer of last resort, set premiums based on FEMA flood maps, and use the collected premiums to pay claims.

The program now covers more than $1.3 trillion in exposure. It writes policies on roughly five million properties, primarily in coastal states and along major river systems. It is, by a wide margin, the largest source of flood insurance in the United States.

It has also never worked as designed from a fiscal standpoint. The program borrowed more than $17 billion from the Treasury after Katrina in 2005 and an additional $9 billion after Sandy in 2012, accumulating a debt that at its peak exceeded $36 billion. Congress has periodically forgiven tranches of that debt—most recently canceling $16 billion in 2017—but the structural deficit remains. Premium revenue does not cover expected losses, and the gap is not a rounding error. The actuarial shortfall is embedded in the program's design: politically driven premium suppression, outdated flood maps that understate risk in many coastal areas, and the repetitive-loss problem documented in the claims data.

The three datasets

FEMA publishes flood insurance data through its OpenFEMA portal as three distinct datasets. They are complementary: claims tells you what happened, policies tells you what coverage was in force, and the multiple-loss file tells you where the structural problem is concentrated.

FIMA NFIP Redacted Claims v2

This is the primary claims ledger: 2.7 million paid claim transactions, one row per claim. The schema includes reportedCity, reportedState,reportedZip, occupancyType (residential or commercial),floodZone, buildingDamageAmount,contentsDamageAmount, totalBuildingInsuranceCoverage,yearOfLoss, and causeOfDamage. The dataset is maintained at ZIP-code level resolution. Individual addresses and policyholder identifiers were removed in 2019 for reasons discussed below.

Monthly updates are published at fema.gov/openfema-data-page/fima-nfip-redacted-claims-v2. The full dataset downloads as a CSV of approximately 500 MB uncompressed.

FIMA NFIP Redacted Policies v2

The policies file is substantially larger: more than 80 million historical policy transactions, covering every policy written since the program's modern data systems began capturing structured records. Each row represents a policy term—who had coverage, where, at what premium, in what flood zone, for what coverage amounts. The policies and claims files can be joined on reportedZip andfloodZone to estimate loss ratios by geography and zone classification, though the redaction of property-level identifiers makes a precise join impossible.

The policies file is the substrate for actuarial analysis: comparing what was collected in premiums against what was paid in claims, by geography and flood zone, over time. The results are not flattering to the program's fiscal management.

NFIP Multiple Loss Properties v1

The third dataset is the most politically significant. It contains approximately 240,000 properties that have filed two or more flood insurance claims—the “repetitive loss” subset in FEMA's terminology. Within this group, a smaller category of “severe repetitive loss” properties have been paid out four or more times, or have received cumulative payments exceeding the property's pre-damage value.

The most extreme cases in the data involve properties that have been paid out 10 to 15 times, receiving total payments that are multiples of the property's assessed value. The economics are straightforward: the property floods, FEMA pays the claim, the owner repairs or rebuilds in place, the property floods again. In many cases the cumulative claim payments would have been sufficient to purchase and demolish the property and buy the owner a comparable home on higher ground. Instead, the cycle continues indefinitely because there is no mechanism in the current program to force mitigation after repeated losses.

Congress addressed this directly in the Biggert-Waters Flood Insurance Reform Act of 2012, which required FEMA to move repetitive-loss properties to actuarially sound premiums and authorized buyout programs. Implementation was halted almost immediately by the Homeowner Flood Insurance Affordability Act of 2014, which rolled back the premium increases after coastal-state legislators objected to the political consequences. The multiple-loss properties dataset documents the result: a list of known problem properties that the federal government continues to insure at below-actuarial rates.

What FEMA redacted and why

The 2019 redaction of individual addresses and policyholder identifiers from the claims and policies datasets was not routine privacy hygiene. It followed a specific incident: journalists and researchers cross-referenced the then-address-level claims data with public property records and tax assessor databases to identify specific properties owned by prominent individuals in flood-prone coastal areas. The combination of address, flood zone, claim amount, and year of loss was sufficient to reconstruct meaningful financial and property histories for individual owners.

FEMA's response was to drop address-level granularity entirely and replace it with ZIP-code level resolution. The practical effect on analytical utility is significant: a ZIP code covering a coastal barrier island and its mainland approach may contain properties with radically different flood risk profiles, and aggregating to ZIP level obscures that variance. For repetitive-loss analysis specifically, the removal of property identifiers makes it impossible to track a single property's claim history across years; you can observe aggregate ZIP-level patterns but not individual property trajectories.

The multiple-loss properties dataset is a partial workaround. It was compiled before the 2019 redaction and retains some property-level structure, though it too has been progressively anonymized.

The repetitive loss signal in the data

Even at ZIP-code resolution, the claims data contains a clear repetitive-loss signal. The analytical approach is to aggregate buildingDamageAmount byreportedZip across all yearOfLoss values, and compare cumulative aggregate payouts against cumulativetotalBuildingInsuranceCoverage in the same ZIP.

In a fiscally sound insurance program, aggregate payouts in a ZIP over a sufficiently long horizon should fall below aggregate coverage—claims are drawn from a pool that is replenished by premiums. The NFIP claims data shows specific ZIPs, concentrated in coastal Louisiana, coastal Texas, coastal New Jersey, and South Florida, where aggregate payouts have exceeded aggregate coverage across the dataset's time horizon. This is mathematically possible in a particular bad-loss year, but sustaining it across multiple decades is a structural signal, not a statistical artifact.

The query pattern for this analysis against the dataset:

SELECT
  reportedZip,
  reportedState,
  COUNT(*)                              AS claim_count,
  SUM(buildingDamageAmount)             AS total_building_paid,
  SUM(contentsDamageAmount)             AS total_contents_paid,
  SUM(buildingDamageAmount
    + contentsDamageAmount)             AS total_paid,
  AVG(totalBuildingInsuranceCoverage)   AS avg_coverage,
  MIN(yearOfLoss)                       AS first_loss_year,
  MAX(yearOfLoss)                       AS last_loss_year
FROM nfip_claims
GROUP BY reportedZip, reportedState
HAVING claim_count > 100
ORDER BY total_paid DESC;

ZIPs that appear at the top of this query sorted by total_paid, filtered to those where total_building_paid approaches or exceeds a multiple ofavg_coverage, are the geographic centers of the repetitive-loss problem. They represent a structural subsidy: federal flood insurance premiums collected nationwide are being used to repeatedly rebuild specific coastal properties that would not be insurable at any actuarially honest premium.

Flood zone classifications

The floodZone field is one of the most analytically useful in the dataset. FEMA designates flood zones based on its Flood Insurance Rate Maps (FIRMs), and the zone classification determines both the mandatory purchase requirement and the base premium calculation.

The four zones most significant in the claims data:

  • AE zone — the 100-year floodplain. Properties here face a 1% annual chance of flooding by FEMA's estimate. Federally backed mortgage holders in AE zones are required to carry flood insurance. AE-zone claims represent the bulk of the dataset by volume.
  • VE zone — coastal high-hazard area with wave action. These are the highest-risk flood zones FEMA designates. VE-zone properties face not just inundation but breaking waves, which dramatically increases structural damage. Average payout per claim in VE zones is the highest of any zone classification in the dataset, reflecting both higher insured values and more severe structural damage.
  • X zone — minimal flood hazard, outside the 500-year floodplain. Flood insurance is not required here. Claims filed by X-zone properties are analytically interesting precisely because they should not occur frequently: their presence in the dataset at meaningful rates is evidence of flood map inaccuracy. Flood maps are updated infrequently, and in areas with significant development or changing drainage patterns, the X-zone designation may substantially understate actual risk.
  • AO zone — areas of sheet flow flooding, typically with 1–3 feet of depth. Common in the Midwest and in areas with poor drainage. AO-zone claims tend to be lower in dollar value than AE or VE but often repeat across the same geographies.

The distribution of claims across flood zones in the data does not always match the expected risk gradient. X-zone claim rates in coastal Texas and Florida are high enough to suggest that FEMA's maps are not accurately representing actual flood exposure in those areas—a problem that has been documented by the Government Accountability Office but has not been systematically corrected because FIRM updates are slow, expensive, and politically contentious in areas where reclassification would trigger mandatory insurance purchase requirements.

The climate signal in yearOfLoss

The yearOfLoss distribution in the claims dataset is not stationary. Annual claim counts through the 1980s and 1990s are relatively low and consistent. Post-2005 the distribution shifts: Katrina produced a spike that compressed prior years into relative insignificance, but the more analytically important pattern is the baseline shift that follows.

From 2012 onward, claim frequency in years without named major-hurricane landfalls exceeds the pre-Katrina average. The 2017 loss year is the largest in the dataset: Harvey alone produced more NFIP claims than any prior single storm, driven by its unusual movement pattern and the volume of precipitation it deposited over the Houston metropolitan area over several days. Irma and Maria added substantially to the 2017 totals. The 2019 loss year reflects widespread Midwest river flooding. The 2021 year captures Hurricane Ida and associated inland flooding across the Northeast.

Average claim amounts are also increasing over time when controlling for inflation. This reflects two things: rising property values in flood-prone coastal areas (higher insured values produce higher nominal payouts when damage rates are constant), and increasing damage severity per event. The second component is the climate signal proper—warmer sea-surface temperatures produce storms with higher rainfall rates, and higher rainfall rates produce deeper inundation per event, which produces higher per-claim damage on structures that have not changed.

Neither trend is favorable for the program's fiscal position. More frequent claims at higher average amounts against a premium base that is politically constrained from reflecting actuarial reality produces a structural deficit that grows over time. The claims data makes this trajectory legible in a way that FEMA's budget documents do not.

Accessing the data

The Federal Regulatory Data Hub ingests the FIMA NFIP Redacted Claims v2 dataset and makes it queryable via REST and MCP at:

curl https://api.ai-analytics.org/datasets/fema-nfip-claims

The endpoint supports filtering by state, floodZone,yearOfLoss, occupancyType, and causeOfDamage, with aggregate functions available for buildingDamageAmount andcontentsDamageAmount. The underlying data refreshes monthly in sync with the OpenFEMA publication schedule.

For the raw OpenFEMA files directly, FEMA publishes all three datasets at fema.gov/about/openfema/data-sets under the “National Flood Insurance Program” category. The claims and policies files are available as paginated API responses or full CSV bulk downloads. The multiple-loss properties file is smaller and downloads as a single CSV.

One practical note on the bulk downloads: the policies file at 80 million rows is large enough that naive CSV parsing will exhaust memory on typical analyst machines. Streaming parse or load directly into a columnar format. The claims file at 2.7 million rows is manageable in-memory on most systems.

What the data cannot tell you

The 2019 redaction is a real analytical limitation and worth being explicit about. ZIP-code-level aggregation makes several important analyses impossible or highly approximate: you cannot reconstruct an individual property's claim history, you cannot precisely identify which specific properties within a high-claim ZIP are the repetitive-loss drivers, and you cannot join the claims data to property records or assessed values without making assumptions about the distribution of claims within the ZIP.

The multiple-loss properties dataset partially addresses the first limitation—it was designed specifically to track properties with repeated claims—but it too is progressively anonymized and does not contain current premium or coverage information.

For research applications requiring property-level flood insurance history, FEMA does provide access to non-redacted data under a data use agreement through its Research Transition Task Force process. The process is slow and approval is not guaranteed, but property-level access is available to researchers with a demonstrated public-interest use case. The redacted public datasets described here are suitable for geographic and temporal analysis at ZIP and state level, which covers most policy-relevant research questions about the program's structure and trajectory.


For FEMA disaster declarations data and how it relates to NFIP claim triggers: FEMA disaster declarations: what 90,000 federal disaster actions reveal about risk concentration →

For how the Federal Regulatory Data Hub ingests and normalizes federal datasets like these for the screening and query API: Compliance screening across 30+ federal enforcement lists: how the risk score works →