Technical writing

PACER Federal Courts: The Database Behind 1 Billion Federal Court Documents

· 12 min read· AI Analytics
PACERFederal CourtsJudiciaryLegal DataFederal Data

The Public Access to Court Electronic Records system holds dockets and documents for every federal district, bankruptcy, and appellate case filed since the 1980s—over 1 billion documents, 900+ million pages, accessible via a per-page fee that generated $100M+ in revenue before the 2023 fee waiver expansion. PACER is simultaneously the most comprehensive public legal database in the world and one of the most difficult to access in bulk.

What PACER is

PACER—Public Access to Court Electronic Records—is the public-facing access layer of the federal judiciary's electronic case management infrastructure. The system is administered by the Administrative Office of the United States Courts (AO), which supports the federal judiciary's IT operations across all Article III courts and the Court of International Trade, the Court of Federal Claims, and the Judicial Panel on Multidistrict Litigation (JPML).

The backend system is CM/ECF—Case Management/Electronic Case Files—which is what lawyers and clerks use to file and manage cases electronically. PACER is the public portal layered on top of CM/ECF. Attorneys file through CM/ECF; the public reads those filings through PACER. The two are often used interchangeably but are technically distinct: CM/ECF is the filing system, PACER is the retrieval system.

Unlike many federal databases that have a central repository, each court maintains its own PACER instance. The 94 federal district courts, 94 bankruptcy courts, 13 circuit courts of appeals, the JPML, the Court of International Trade, and the Court of Federal Claims each run their own CM/ECF system. A single PACER account created at pacer.gov provides unified login across all of them, but the underlying databases are not merged—searching for a case requires knowing which court to query, or using the PACER Case Locator to search across all courts simultaneously.

What is in the system

The scope of PACER is broad, but not unlimited. Understanding what falls inside and outside the system matters for anyone building applications on top of it.

Civil dockets are the core of PACER for most research uses. Every civil complaint filed in federal district court generates a docket entry. Subsequent filings—answers, motions to dismiss, discovery orders, summary judgment briefing, trial exhibits, opinions, judgments, and notices of appeal— each generate additional docket entries with linked documents. A fully litigated civil case may have hundreds of docket entries. The docket text summarizes each filing; the linked document is the actual PDF. Both are in PACER.

Criminal dockets are also in PACER, though with important exceptions. Plea agreements, sentencing memoranda, and PSRs (pre-sentence reports) are often sealed. Juvenile cases, immigration removal proceedings, and national security matters handled under the Classified Information Procedures Act (CIPA) generate sealed or restricted dockets. A criminal docket visible in PACER may represent only a portion of the actual case file.

Bankruptcy filings are held in the bankruptcy court CM/ECF instances, which are separate from the district court systems. Chapter 7 liquidation petitions, Chapter 11 reorganization plans, Chapter 13 wage-earner plans, schedules of assets and liabilities, statements of financial affairs, proofs of claim, trustee reports, and adversary proceeding dockets (separate mini-cases litigated within a bankruptcy) are all in PACER via the bankruptcy court instances.

Appellate dockets are in PACER via the circuit court instances. Briefs, appendices, oral argument recordings (for courts that post them), and opinions are accessible. The Supreme Court is not part of PACER; it maintains its own separate electronic filing system.

What PACER does not contain: sealed documents (though their existence on the docket is often indicated), grand jury materials, immigration court proceedings (those are EOIR, not Article III courts), most state court records, and records predating the electronic filing era that were never digitized.

Case number format and docket structure

Federal case numbers follow a standardized format that encodes the filing court, year, case type, and sequential number within that court and year. The format is:

ComponentExampleMeaning
11:23-cv-12345Division number within the district (optional in some districts)
231:23-cv-12345Two-digit filing year (2023)
cv1:23-cv-12345Case type: civil
123451:23-cv-12345Sequential docket number within that court and year

The case type codes are standardized across all district courts: cv for civil, cr for criminal, mc for miscellaneous, and po for petty offense. Bankruptcy courts use bk for main bankruptcy cases and ap for adversary proceedings (the separate mini-cases filed within a bankruptcy). Each docket entry carries an entry number, the date filed, docket text summarizing the filing, and a link to the document if one was filed.

Party codes appear throughout docket entries and case metadata: pla for plaintiff, dft for defendant, db for debtor, tr for trustee, cr for creditor, int for intervenor, andapp for appellant. These codes drive the party search functionality in the PACER Case Locator.

The fee controversy

PACER's per-page fee structure has been its most contentious feature since its inception. The fee was set at $0.07 per page in the early years, raised to $0.08, and eventually settled at $0.10 per page—where it stood for many years. The fee applies not just to downloaded documents but to docket sheets themselves: viewing a 50-entry docket sheet costs the same as downloading 50 pages of PDFs, which can add up quickly in complex litigation.

The revenue generated substantially exceeds the costs of operating the PACER system. The Administrative Office reported annual PACER revenue exceeding $100 million in multiple years, with the surplus transferred to the Courts Technology Fund and used to finance broader court IT infrastructure including CM/ECF development, electronic filing systems, and the federal judiciary's network operations. Critics argued for decades that public court records—created at public expense, recording the exercise of public judicial power—should not require payment for access, and that the surplus revenue effectively constituted a tax on access to justice.

Congress addressed the fee structure in fits and starts. The E-Government Act of 2002 directed the courts to make PACER documents freely available to the extent possible. In practice, quarterly fee waivers were granted to users who accrued less than $15 in charges, meaning casual or low-volume users paid nothing. In 2023, Congress raised this threshold to $30 per quarter, substantially expanding free access for individuals and small organizations. Nonprofits, academic institutions, and government entities can apply for fee waivers that eliminate the per-page charge entirely.

Carl Malamud, the RECAP extension, and the public mirror movement

The effort to make PACER documents freely and permanently available has been driven primarily by two organizations whose approaches, while complementary, are distinct.

Carl Malamud, founder of public.resource.org, has been the most confrontational advocate for public access to government documents. Beginning in 2008, Malamud exploited a PACER fee waiver program available to federal depository libraries to mass-download millions of PACER documents and republish them at no cost. When the Administrative Office shut down the pilot program in response, Malamud had already obtained and published approximately 20 million documents. The AO sued in 2009 to recover the fees it contended were owed; Malamud countered that public court documents cannot be subject to copyright. The litigation settled without a clear ruling on the copyright question, but the published documents remained available, establishing the principle that PACER documents, once downloaded, could be freely redistributed.

The RECAP project—originally a Princeton Center for Information Technology Policy initiative, now run by the Free Law Project—takes a distributed approach. The RECAP browser extension (available for Chrome and Firefox) intercepts PACER downloads and automatically mirrors each document to the RECAP Archive, a public repository accessible through CourtListener. When any RECAP user pays to download a PACER document, that document immediately becomes freely available to everyone else who searches for it through RECAP. The archive has grown to over 300 million documents. The acronym is a deliberate reverse of PACER: RECAP stands for “RECAP: PACER.”

The Free Law Project operates CourtListener, the most comprehensive publicly accessible federal court data platform. CourtListener hosts the RECAP Archive, provides a full-text search index of RECAP documents and scraped court opinions, and exposes a REST API that allows programmatic access to dockets, parties, attorneys, and opinions without per-page fees.

Accessing PACER data without per-page fees

Several access paths exist for researchers who need PACER data at scale without the per-page fees becoming prohibitive.

The PACER Case Locator at pcl.uscourts.gov/pcl/pages/search/findCase.jsf provides a national search across all courts simultaneously. It is free to use for searching (the fee applies only to viewing docket sheets and downloading documents) and returns case metadata including court, case number, case name, filing date, and party names. This is the starting point for national searches across the full federal court system.

The CourtListener API at courtlistener.com/api/rest/v3/ provides free programmatic access to dockets, opinions, oral argument recordings, judges, attorneys, and party information. A free API key provides access to the full dataset; the API is rate-limited but generous for research use. CourtListener's opinion index covers Supreme Court opinions back to 1754 and includes millions of federal circuit and district court opinions scraped from court websites and extracted from the RECAP Archive.

The SCALES-OKN project (Systematic Content Analysis of Litigation Events, Open Knowledge Network) is an academic collaboration that has assembled a dataset of over 2 million federal district court cases, including structured docket data, with research access available through affiliated universities.

For bulk historical access, public.resource.org maintains bulk downloads of the millions of documents Malamud originally acquired. The Internet Archive has also mirrored significant portions of the RECAP Archive for long-term preservation.

What you can learn from PACER data

Federal court data is underused as a research resource despite its breadth. Several analytical applications illustrate what is possible at scale.

Litigation trend analysis. The volume and composition of federal civil filings tracks economic conditions, regulatory enforcement priorities, and policy changes with a lag. The surge in False Claims Act qui tam complaints during COVID-19 relief program fraud enforcement, the wave of securities class actions following major earnings restatements, and the concentration of patent infringement litigation in the Eastern District of Texas are all visible in PACER filing data. The Administrative Office publishes aggregate filing statistics annually, but PACER docket data permits finer-grained analysis by judge, by statute, by party industry, or by law firm.

Corporate litigation exposure tracking. Public companies face continuing disclosure obligations for material litigation. But PACER data often provides earlier and more granular information than SEC filings about pending cases—which parties are involved, what the theory of liability is, which law firms are representing each side, and what the early procedural record suggests about the strength of the case. Hedge funds and institutional investors have been active PACER consumers for exactly this reason.

Judicial behavior research. Law professors and empirical legal scholars use PACER data to study how federal judges rule across different case types, how appointment politics correlates with outcomes, and how circuit splits develop across district courts. The CourtListener opinion dataset, which is fully searchable and downloadable, supports research that would have required months of manual case law review a generation ago.

Bankruptcy intelligence. Federal bankruptcy filings are a leading indicator of corporate distress. Chapter 11 petitions filed by significant companies appear in PACER—often before they are widely reported in the press—along with schedules of assets and liabilities that disclose the debtor's creditor list, the amounts owed, and the estimated value of assets. This information is public and machine-readable for anyone monitoring relevant courts.

CourtListener API: fetching recent federal opinions

The CourtListener API provides the most practical entry point for bulk PACER-derived data without per-page fees. The following Python example fetches recent federal district court opinions containing a keyword query, then parses the court name, docket number, case name, date filed, and download URL, printing the top 10 most recent results as a formatted table:

import requests
import json
from datetime import datetime

# ---------------------------------------------------------------------------
# CourtListener REST API v3 -- Recent federal district court opinions
# Docs: https://www.courtlistener.com/api/rest/v3/
# Free API key required: https://www.courtlistener.com/sign-in/
# ---------------------------------------------------------------------------

BASE = "https://www.courtlistener.com/api/rest/v3"
API_KEY = "YOUR_API_KEY_HERE"   # replace with your CourtListener key

HEADERS = {
    "Authorization": f"Token {API_KEY}",
    "User-Agent": "research-bot/1.0 (contact: research@example.org)",
}

# ---------------------------------------------------------------------------
# Fetch recent federal district court opinions containing "trade secret"
# Court filter: federal district courts only (jurisdiction=FD)
# CourtListener uses "opinions" endpoint for full written decisions.
# ---------------------------------------------------------------------------

params = {
    "search_type": "o",            # opinions
    "q": "trade secret",           # full-text keyword search
    "court": "dcd cacd nysd txsd casd ilnd flsd flmd", # major district courts
    "filed_after": "2024-01-01",   # recent opinions only
    "order_by": "score desc",      # most relevant first
    "format": "json",
    "page_size": 10,
}

resp = requests.get(f"{BASE}/search/", params=params, headers=HEADERS, timeout=30)
resp.raise_for_status()
data = resp.json()

results = data.get("results", [])
print(f"Total matching opinions: {data.get('count', 0):,}")
print(f"Retrieved: {len(results)} results\n")

# ---------------------------------------------------------------------------
# Parse and display the top 10 most recent trade-secret opinions
# Fields: court name, docket number, case name, date filed, download URL
# ---------------------------------------------------------------------------

print(f"  {'#':<3}  {'Date Filed':<12}  {'Court':<8}  {'Docket':<20}  Case Name")
print("  " + "-" * 100)

for i, opinion in enumerate(results, start=1):
    # CourtListener nests case metadata under the cluster object
    cluster_url = opinion.get("cluster", "")

    # Fetch the cluster to get docket number and case name
    if cluster_url:
        cluster_resp = requests.get(cluster_url, headers=HEADERS, timeout=15)
        cluster_resp.raise_for_status()
        cluster = cluster_resp.json()
        case_name   = cluster.get("case_name", "Unknown")[:55]
        docket_url  = cluster.get("docket", "")
        date_filed  = cluster.get("date_filed", "")
    else:
        case_name = opinion.get("caseName", "Unknown")[:55]
        docket_url = ""
        date_filed = opinion.get("dateFiled", "")

    # Fetch docket to get docket number and court identifier
    docket_number = "N/A"
    court_id      = "N/A"
    if docket_url:
        docket_resp = requests.get(docket_url, headers=HEADERS, timeout=15)
        docket_resp.raise_for_status()
        docket_data   = docket_resp.json()
        docket_number = docket_data.get("docket_number", "N/A")
        court_id      = docket_data.get("court_id", "N/A")

    # Build the download URL for the opinion text
    opinion_id   = opinion.get("id", "")
    download_url = f"https://www.courtlistener.com/opinion/{opinion_id}/"

    # Format docket number for display (truncate if too long)
    docket_display = (docket_number[:18] + "..") if len(docket_number) > 20 else docket_number
    case_display   = (case_name[:53] + "..") if len(case_name) > 55 else case_name

    print(
        f"  {i:<3}  {str(date_filed):<12}  {court_id:<8}  "
        f"{docket_display:<20}  {case_display}"
    )
    print(f"       Download: {download_url}")
    print()

# ---------------------------------------------------------------------------
# Write full results to JSON for downstream analysis
# ---------------------------------------------------------------------------
with open("pacer_trade_secret_opinions.json", "w") as f:
    json.dump(results, f, indent=2, default=str)

print(f"\nWrote {len(results)} opinion records to pacer_trade_secret_opinions.json")
print("Each record includes cluster URL, court, date filed, and text download link.")

The script above illustrates the two-step structure that CourtListener's API requires for full docket metadata: first fetch the opinion, then follow the cluster URL to get case-level metadata, then follow the docket URL for the docket number and court identifier. For bulk research, these lookups should be batched and cached; a single opinion search can return thousands of results across dozens of courts. The API's page_size parameter accepts up to 100 results per request, and pagination is handled via the next link in the response JSON.

The court filter string in the example above uses CourtListener's court identifier codes: dcd for the District of Columbia, cacd for the Central District of California, nysd for the Southern District of New York, txsd for the Southern District of Texas, and so on. CourtListener's full court list at courtlistener.com/api/rest/v3/courts/ provides identifiers for all supported courts, including bankruptcy courts and circuit courts.

PACER data in the Federal Data Hub

Federal court docket data intersects with nearly every other regulatory dataset in the Federal Data Hub. SEC Litigation Releases reference PACER case numbers for underlying civil complaints. DOJ False Claims Act settlements reference the district court cases in which qui tam complaints were filed under seal. FERC enforcement actions in federal appellate courts are docketed in PACER alongside the agency record. OFAC sanctions designations challenged by affected parties generate federal district court dockets that are the primary public record of those challenges.

The Federal Data Hub cross-references PACER case numbers against DOJ press releases, SEC Litigation Releases, and agency enforcement databases wherever those cross-references appear in the underlying documents. This allows a query like “show me all PACER cases associated with a given company or individual” to aggregate results across district courts, bankruptcy courts, and agency enforcement records simultaneously.


Related: DOJ False Claims Act settlements · SEC enforcement actions

Part of the Federal Regulatory Data Hub.