Technical writing
DOJ False Claims Act Settlements: The $70 Billion Fraud Recovery Database
The False Claims Act is the federal government's oldest and most productive anti-fraud statute. Since Congress overhauled it in 1986, it has generated more than $70 billion in recoveries from companies that defrauded Medicare, Medicaid, defense contracts, and dozens of other federal programs. Most of that money came not from government investigators but from private citizens—whistleblowers who sued on the government's behalf and collected a share of the recovery. The settlements, the companies, the programs defrauded, and the whistleblower shares are all public. Almost no one treats them as a dataset.
The statute: 31 U.S.C. §§ 3729–3733
The False Claims Act dates to 1863, when President Lincoln signed it to stop defense contractors from selling the Union Army defective equipment and non-existent horses. The qui tam provision—from the Latin qui tam pro domino rege quam pro se ipso in hac parte sequitur, meaning “who as well for the king as for himself sues in this matter”—allowed private citizens with knowledge of fraud to bring suit on the government's behalf. The statute languished for a century before Congress dramatically strengthened it in 1986 in response to Pentagon procurement scandals.
The core liability provision at § 3729 imposes civil penalties of $13,000 to $27,000 per false claim (adjusted annually for inflation), plus treble damages— three times the government's actual losses. These multipliers make the FCA extraordinarily powerful: a hospital that submitted 50,000 fraudulent Medicare claims at $200 each faces not a $10 million exposure but a potential liability exceeding $30 million in damages plus per-claim penalties that can dwarf the underlying loss. The statute covers any false claim, false statement, or fraudulent course of conduct that causes the government to pay money it would not otherwise have paid, or to forgo money it was owed.
The qui tam mechanism
A qui tam relator—the legal term for the private citizen plaintiff—files a complaint in federal district court under seal. The complaint is not served on the defendant and does not appear in the public docket. The Department of Justice Civil Division receives a copy and has 60 days to decide whether to intervene and take over the litigation, though in practice the investigation period routinely extends for years under court-ordered seal extensions. During the seal period, DOJ investigators review the relator's evidence, interview witnesses, issue civil investigative demands (the civil equivalent of grand jury subpoenas), and assess the strength of the case.
At the end of the investigation, DOJ makes one of two elections. If it intervenes, the government becomes the primary litigant and the relator steps into a secondary role as a party to the action. If DOJ declines to intervene, the relator may proceed independently—though cases that DOJ has declined are statistically less likely to result in large recoveries. The intervention decision is the critical fork in every qui tam case.
When a case resolves—by settlement or judgment—the relator receives between 15% and 30% of the recovery. Government-intervened cases carry a statutory relator share of 15–25%; declined cases where the relator litigated alone carry a share of 25–30%. The relator's share is paid from the recovery before the remainder goes to the Treasury. In the GlaxoSmithKline $3 billion settlement of 2012, relators received approximately $1 billion collectively across three separate qui tam actions that had been consolidated for resolution.
What DOJ publishes
The DOJ Civil Division publishes two distinct data sources on False Claims Act activity. The first is the annual FCA statistics document, available as a PDF at justice.gov/civil/false-claims-act. This document provides fiscal-year totals for new qui tam filings, total FCA filings, total recoveries, and recoveries by sector going back to 1986. It is the canonical source for the $70 billion cumulative figure and for tracking the share of FCA activity that originates with whistleblowers versus government-initiated cases.
The second source is individual settlement and judgment press releases, published by the Civil Division and by individual United States Attorney's Offices nationwide. Each press release typically contains: the defendant company and any individual defendants, the settlement or judgment amount, a description of the fraudulent conduct, the federal programs defrauded, the district court where the case was filed, the names of the relators if they have consented to disclosure, and the relator share amount. These press releases are published inconsistently—some cases settle quietly without a press release—but major resolutions almost always generate at least one DOJ announcement.
There is no official bulk download of FCA settlement records. Building a comprehensive database requires scraping both the DOJ Civil Division page and the USAO press release archives across all 94 federal judicial districts.
Sector breakdown: healthcare dominates
Healthcare fraud has accounted for 70–80% of annual FCA recoveries in every recent fiscal year, and the margin has widened over time as DOJ has invested in healthcare enforcement infrastructure. The DOJ annual statistics consistently show healthcare recoveries running five to ten times the recoveries from the next largest sector.
The mechanisms of healthcare fraud covered by the FCA are numerous and technically specific. Upcoding is the submission of claims coded for more expensive services than were actually provided—billing for a complex office visit when only a brief consultation occurred, or billing for a surgical procedure at a higher complexity level than the documentation supports. Unbundling is billing separately for procedures that should be billed together at a combined lower rate. Medically unnecessary services claims cover services ordered and billed without clinical justification. Anti-kickback violations—paying physicians for referrals in violation of the Stark Law and the Anti-Kickback Statute—generate FCA liability when the resulting claims are submitted to federal health programs.
Defense procurement fraud is the second-largest sector, historically accounting for 10–15% of annual FCA recoveries. Defense cases typically involve defective product claims (delivering equipment that does not meet contract specifications), false cost certifications, and improper cost allocations to government contracts. The defense sector generates smaller numbers of cases than healthcare but with higher average settlement values per case.
COVID-19 relief programs produced a surge in FCA activity from 2020 through 2023. The Paycheck Protection Program, Economic Injury Disaster Loans, and Provider Relief Fund collectively distributed hundreds of billions of dollars with limited initial verification. DOJ established a COVID-19 Fraud Enforcement Task Force in 2021 and has brought hundreds of FCA cases involving PPP loans obtained through false certifications, fraudulent EIDL applications, and healthcare providers who claimed Provider Relief Fund payments for services they did not render. COVID FCA cases have expanded the universe of small-dollar defendants considerably relative to the historically large-company profile of FCA enforcement.
Major cases and the shape of the data
The largest FCA settlement in history is GlaxoSmithKline's 2012 resolution for $3 billion, covering off-label promotion of Paxil, Wellbutrin, and Avandia, as well as failure to report safety data to the FDA and payment of kickbacks to physicians. The criminal plea component was $1 billion; the civil FCA resolution was $2 billion. Three qui tam relators brought the underlying whistleblower cases; their combined share was approximately $1.017 billion.
Abbott Laboratories settled for $1.5 billion in 2012 over off-label promotion of Depakote for uses not approved by the FDA, including the use of the drug as a sedative in elderly nursing home patients. Abbott's sales force had been trained to promote the drug for dementia, schizophrenia, and agitation—none of which were FDA-approved indications—resulting in claims to Medicare and Medicaid for medically unsupported prescriptions. A former Abbott district sales manager was one of the qui tam relators.
Purdue Pharma's 2020 resolution totaled $8.3 billion across criminal and civil components, though the practical recovery was limited by Purdue's bankruptcy. The FCA component addressed Purdue's payments to physicians who were high prescribers of OxyContin and its promotion of the drug at doses above those supported by clinical evidence. The resolution was structured to allow the Sackler family to retain significant assets through the bankruptcy proceeding, a feature that subsequent litigation and congressional scrutiny contested.
Hospital upcoding cases have generated consistent FCA activity across the country. Health Management Associates settled for $260 million in 2018 over a years-long scheme in which hospital administrators pressured emergency department physicians to admit patients who did not meet inpatient admission criteria, generating higher-reimbursing inpatient claims to Medicare instead of lower-reimbursing outpatient claims. Similar cases have been brought against Community Health Systems, Tenet Healthcare, and dozens of regional hospital systems.
Qui tam statistics: the whistleblower transformation
The 1986 amendments transformed the FCA from a rarely used statute into the government's primary fraud recovery tool. Before 1986, qui tam filings numbered in the single digits per year. By fiscal year 2022, DOJ received 652 new qui tam complaints—nearly two per business day. Relator-initiated cases have risen from roughly 30% of FCA activity in the late 1980s to more than 80% of activity by the 2010s and 2020s. In fiscal year 2022, of $2.2 billion in total FCA recoveries, $1.9 billion came from qui tam cases.
The qui tam bar has become a specialized practice area. Firms that represent whistleblowers in FCA cases work on contingency—they receive a portion of the relator's share as their fee—and have developed sophisticated systems for evaluating and filing qui tam complaints. Former government employees, healthcare compliance officers, pharmaceutical sales representatives, and hospital billing staff are the most common relator profiles. The statute's first-to-file rule—only the first relator to file a complaint on a given fraud scheme receives the qui tam share—creates competitive pressure for relators and their counsel to file quickly once they have identified a viable case.
Building a settlements database: the scraping approach
The DOJ press release corpus is the primary raw material for a machine-readable FCA settlements database. Each press release is structured HTML containing the settlement amount, defendant names, program defrauded, relator information, and the legal citation for the underlying conduct. A systematic scraper can extract these fields with moderate accuracy using pattern matching on the press release text, with manual review for edge cases.
The Civil Division's FCA statistics PDFs provide annual aggregate figures that can validate totals extracted from press releases. Because press releases are not published for every settlement—particularly smaller cases resolved by individual USAO offices without Civil Division involvement—the press release corpus undercounts total FCA activity. The aggregate statistics PDFs show the true total; the gap between the two represents cases with no public press release.
import requests
from bs4 import BeautifulSoup
import re
import json
import time
from datetime import datetime
# DOJ False Claims Act press releases live under the Civil Division news section.
# The search endpoint accepts keyword queries and returns paginated HTML results.
# There is no official bulk download; systematic scraping is required.
BASE_SEARCH = "https://www.justice.gov/civil/false-claims-act"
PRESS_SEARCH = "https://www.justice.gov/news/press-releases"
HEADERS = {
"User-Agent": "Mozilla/5.0 (research; contact research@example.org)"
}
def search_fca_releases(keyword="false claims act settlement", page=0):
"""Search DOJ press releases for FCA settlements."""
params = {
"combine": keyword,
"page": page,
}
resp = requests.get(PRESS_SEARCH, params=params, headers=HEADERS, timeout=30)
resp.raise_for_status()
return resp.text
def parse_release_list(html):
"""Parse a DOJ press release search results page."""
soup = BeautifulSoup(html, "html.parser")
results = []
# DOJ uses view-content rows for press release listings
for item in soup.select(".views-row, article.views-row"):
a_tag = item.find("a", href=True)
if not a_tag:
continue
href = a_tag["href"]
if not href.startswith("/"):
continue
date_tag = item.find(class_=re.compile(r"date|time", re.I))
results.append({
"url": "https://www.justice.gov" + href,
"title": a_tag.get_text(strip=True),
"raw_date": date_tag.get_text(strip=True) if date_tag else "",
})
return results
def parse_settlement_release(url):
"""Fetch and parse a single FCA settlement press release."""
resp = requests.get(url, headers=HEADERS, timeout=30)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
body_el = soup.select_one(".field--name-body, #content-start, .field-items")
body_text = body_el.get_text(" ", strip=True) if body_el else ""
record = {
"url": url,
"title": "",
"date": "",
"defendant": "",
"settlement_amount": None,
"program_defrauded": "",
"relator": "",
"relator_share": None,
"doj_division": "",
"body_excerpt": body_text[:500],
}
# Title
h1 = soup.find("h1")
if h1:
record["title"] = h1.get_text(strip=True)
# Date — DOJ typically places it in a <time> element or a .date-display-single span
time_el = soup.find("time")
if time_el:
record["date"] = time_el.get("datetime", time_el.get_text(strip=True))
# Dollar amount — look for patterns like "$3 billion", "$1.5 million"
amt_match = re.search(
r"\$([\d,]+(?:\.\d+)?)[\s\xa0]*(billion|million|thousand)",
body_text, re.I
)
if amt_match:
raw = float(amt_match.group(1).replace(",", ""))
multiplier = {"billion": 1e9, "million": 1e6, "thousand": 1e3}
record["settlement_amount"] = raw * multiplier[amt_match.group(2).lower()]
# Relator share — "relator will receive approximately $X million"
rel_match = re.search(
r"relator[^.]*?\$([\d,]+(?:\.\d+)?)[\s\xa0]*(billion|million|thousand)",
body_text, re.I
)
if rel_match:
raw = float(rel_match.group(1).replace(",", ""))
multiplier = {"billion": 1e9, "million": 1e6, "thousand": 1e3}
record["relator_share"] = raw * multiplier[rel_match.group(2).lower()]
# Program defrauded
for prog in ["Medicare", "Medicaid", "TRICARE", "CHAMPUS", "VA", "SNAP",
"Small Business Administration", "COVID", "PPP", "EIDL",
"Department of Defense", "Defense", "HUD"]:
if prog.lower() in body_text.lower():
record["program_defrauded"] = prog
break
return record
def scrape_fca_database(max_pages=40):
"""Paginate through DOJ FCA press releases and parse each one."""
all_releases = []
seen_urls = set()
for page in range(max_pages):
html = search_fca_releases(page=page)
items = parse_release_list(html)
if not items:
print("No results on page " + str(page) + ", stopping.")
break
print("Page " + str(page) + ": " + str(len(items)) + " releases")
for item in items:
url = item["url"]
if url in seen_urls:
continue
seen_urls.add(url)
try:
record = parse_settlement_release(url)
record["list_date"] = item["raw_date"]
all_releases.append(record)
time.sleep(1.0)
except Exception as e:
print("Error on " + url + ": " + str(e))
time.sleep(1.5)
return all_releases
# Run the scraper
settlements = scrape_fca_database(max_pages=40)
# Write to JSON for downstream analysis
with open("fca_settlements.json", "w") as f:
json.dump(settlements, f, indent=2, default=str)
# Summary statistics
total = len(settlements)
with_amount = [s for s in settlements if s["settlement_amount"]]
total_recovered = sum(s["settlement_amount"] for s in with_amount)
with_relator = [s for s in settlements if s["relator_share"]]
print("Total releases parsed: " + str(total))
print("Releases with dollar amount: " + str(len(with_amount)))
print("Total recovered (subset): $" + str(round(total_recovered / 1e9, 2)) + "B")
print("Releases with relator share: " + str(len(with_relator)))The scraper above covers the core extraction logic. Production-grade implementations add entity resolution to handle variant company names across press releases ( “GlaxoSmithKline LLC” and “GlaxoSmithKline plc” are the same defendant), geocoding of defendant headquarters, and classification of the underlying fraud theory using keyword matching on the body text. The program-defrauded field extracted by the scraper is a rough heuristic; a more precise classification requires reading each press release's factual description.
How compliance teams and journalists use the database
Corporate compliance teams use FCA settlement databases to benchmark their own company's exposure and to monitor competitors. A pharmaceutical company that sees repeated FCA settlements in a particular therapeutic area—off-label promotion of antipsychotics, for instance—can assess whether its own promotional practices would survive DOJ scrutiny. The press releases name not just the companies but often the specific conduct: the names of physicians who received improper payments, the specific billing codes at issue, the hospitals involved in upcoding schemes. That level of specificity is useful for compliance officers trying to identify analogous risk in their own operations.
Healthcare fraud attorneys use the settlement database to evaluate qui tam complaints before filing. The first-to-file rule creates strong incentives to check whether a similar case is already under seal before investing resources in a new qui tam. The public settlement database shows what has already resolved; inference about what might still be under seal requires reviewing PACER for sealed dockets in relevant districts.
Investigative journalists use FCA settlement data to identify fraud patterns across companies and time. The opioid coverage of the 2010s was substantially informed by FCA press releases that named specific physician-payment schemes and named the pharmaceutical sales representatives who implemented them. The COVID fraud coverage of 2021–2023 was similarly driven by DOJ press releases that named specific defendants, the amounts they fraudulently obtained, and in many cases the investigative techniques used to detect the fraud. Joining FCA settlements against CMS Open Payments data—which tracks every physician payment by pharmaceutical and device companies—allows journalists to identify physicians who received payments from companies that subsequently settled FCA cases in the same therapeutic areas where those physicians prescribed.
Public health researchers use the healthcare FCA settlement corpus to study the geography of billing fraud. Hospital upcoding cases cluster in regions with high Medicare Advantage penetration and thin hospital margins; kickback cases cluster around high-volume referral networks in high-cost markets. The settlement press releases name the specific hospitals, the time periods at issue, and the approximate number of false claims submitted—enough information to map fraud activity at a facility level across the country.
Data access via the Federal Data Hub
The Federal Data Hub indexes DOJ FCA settlement press releases with entity resolution at https://api.ai-analytics.org/datasets/doj-false-claims-act. The endpoint supports filtering by sector, program defrauded, settlement amount range, and date. Each record includes the original press release URL, the extracted dollar amounts, and the relator share where disclosed.
For the related database of federal deferred prosecution and non-prosecution agreements, including DPAs with healthcare and pharmaceutical companies: The DPA database: every federal deferred prosecution agreement since 1992 →
For the HHS Office for Civil Rights breach database covering HIPAA enforcement actions that often accompany FCA investigations at healthcare entities: HHS OCR HIPAA breach data: the complete enforcement record →
For EPA enforcement settlements and penalty data, which follow a similar press release structure and can be cross-referenced against FCA cases involving environmental fraud: EPA enforcement data: penalties, injunctive relief, and compliance orders →