Technical writing

Foreign agents in plain sight: mapping DC's hidden influence network with FARA data

· 9 min read· AI Analytics
Regulatory dataFARAForeign influenceLobbying

Every year, dozens of Washington lobbying firms register with the Department of Justice to represent Saudi Arabia, the UAE, Turkey, China, Israel, and scores of other foreign governments and their state-controlled enterprises. The legal obligation to do so has existed since 1938. The filings are public record. And yet the dataset that captures all of it—who represents whom, for how much, doing what—is buried inside an Oracle APEX URL that looks broken and is almost never referenced in journalism or policy research.

This post documents the FARA bulk data: what it is, where it lives, how to parse it, what the enforcement record looks like, and how cross-referencing it against LDA lobbying disclosures, OFAC sanctions, and federal contracts produces a picture of foreign influence that no single dataset can provide alone.

What FARA requires

The Foreign Agents Registration Act, passed in 1938 in response to Nazi propaganda operations in the United States, requires any person acting as an agent of a foreign government or foreign political party to register with the DOJ's National Security Division and disclose the nature of their activities, their foreign principal, and the compensation they receive. The statute covers political activities, public-relations work, lobbying of government officials, and “any other activity in the United States for or in the interests of” a foreign principal.

For most of its history, FARA was treated as an administrative formality. The law sat largely dormant for decades—DOJ sent zero criminal referrals for FARA violations between 1966 and 2016, a fifty-year enforcement gap that effectively communicated to the Washington influence industry that registration was optional. That changed abruptly in 2017 and 2018 when Paul Manafort and Michael Flynn became the first high-profile FARA prosecutions in modern memory, both stemming from the Mueller investigation. DOJ subsequently updated its guidance, established a dedicated FARA unit within the National Security Division, and began issuing more advisory opinions and referrals. The rate of new registrations rose noticeably after 2017.

A closely related instrument, the Lobbying Disclosure Act exemption under Section 3 of FARA, allows agents whose foreign principal's interests are not primarily those of a foreign government or party—a foreign commercial entity, for instance—to file under the LDA instead. These registrants appear in LDA filings as foreign-entity clients rather than in FARA. This carve-out is significant: it means the FARA dataset systematically understates foreign lobbying because commercial-interest work for state-owned enterprises often qualifies for the LDA exemption even when the principal is effectively government-controlled.

Where the data lives

DOJ publishes FARA bulk data at a URL that has defeated more than a few researchers on first encounter:

https://efile.fara.gov/ords/fara/f?p=API:BULKDATA

That is an Oracle APEX application URL. The page itself is minimal—four download links for four ZIP files, styled in Oracle's default APEX theme. It occasionally returns an ERR-7620 session-expired error, which resolves on refresh. There is no REST API, no pagination, no versioning, no documented schema beyond the column headers inside the CSVs themselves. The files are updated daily, though DOJ does not publish a changelog or timestamp in the file names; you have to download and compare to detect what changed.

The four files are:

  • FARA_All_Registrants.zip — one row per registered agent (the DC firm or individual). Columns include registration number, registrant name, address, registration date, and termination date where applicable.
  • FARA_All_ForeignPrincipals.zip — one row per foreign principal relationship. Each registrant may have multiple principals. This file names the foreign government, state-owned enterprise, political party, or other foreign entity the registrant represents, along with the country of origin and the dates the relationship was active.
  • FARA_All_RegistrantDocs.zip — one row per filing document. Each registration generates multiple documents over time: registration statements, supplemental statements filed every six months, amendments, and exhibits. Every row includes a URL to the actual PDF or HTML document on the FARA e-file system.
  • FARA_All_ShortForms.zip — registrants who filed under the short-form exemption. Short forms apply to agents whose activities for a foreign principal are solely within the LDA lobbying definition and whose principal is not a foreign government or political party. These registrants disclose less detail than full FARA registrants.

All four files are ISO-8859-1 encoded, not UTF-8. Parse them accordingly or you will get silent corruption on non-ASCII characters in foreign entity names. The CSVs use standard comma delimiters with quoted fields; there are occasional unquoted commas inside address fields in older records.

What the data contains

As of 2025, the dataset holds approximately 700 active registrations and thousands of historical ones stretching back to the late 1990s in the electronic system (older paper filings exist in scanned form but are not in the bulk download). The active registrations span a striking range of foreign principals:

Saudi Arabia and Saudi-controlled entities are among the most heavily represented. Dozens of DC firms—including several of the largest lobbying shops—have active or recent FARA registrations for Saudi Aramco, the Saudi Embassy, the Saudi Arabian Cultural Mission, the Royal Commission for AlUla, and various Saudi Ministry offices. The dollar figures in the supplemental statements, which registrants must file every six months and which detail activities and compensation, run into the tens of millions annually across the Saudi principal universe.

The UAE has a similar footprint. Turkish government registrations are notably concentrated in PR and public affairs firms; the Turkish Ministry of Foreign Affairs and the Presidency of the Republic of Turkey appear as principals in registrations that span the period of the Erdogan government's most active US outreach. Chinese state media organizations—Xinhua, China Global Television Network, China Radio International—are registered as foreign missions rather than under FARA, but Chinese government-linked principals do appear in the FARA dataset through other channels. Israeli-linked think tanks and advocacy organizations appear in a subset of registrations; the line between foreign government direction and independent advocacy is contested in several of these cases, and DOJ advisory opinions on specific organizations have been sought and published.

The RegistrantDocs file is where the substantive intelligence lives. Supplemental statements must itemize every activity the registrant undertook during the six-month reporting period: meetings with Members of Congress and their staff, contacts with executive branch officials, media placements, events organized, and informational materials disseminated. Cross-referencing the activity descriptions with Congressional roll-call votes on bills mentioned in those descriptions is one of the more revealing analytical moves the dataset supports.

The enforcement gap

There have been thirteen criminal FARA prosecutions in the statute's entire history. Thirteen, across eighty-seven years. The enforcement record is not a curve that rises and falls with political attention—it is nearly a flatline, punctuated by the Manafort and Flynn cases and a handful of others. The practical consequence is that FARA registration has functioned, for most of its history, as a voluntary disclosure system with a legal obligation attached.

DOJ's standard response to apparent violations is a letter asking the subject to “come into compliance”—to register retroactively or to amend existing filings to reflect omitted activities. The FARA unit has issued dozens of such letters in the post-2017 period, and a meaningful number of registrations in the dataset are retroactive, filed years after the activities they describe. The supplemental statement date and the registration date together reveal which ones.

Experts who have studied the FARA system estimate that 50 percent or more of foreign agent activity that meets the statutory definition is never registered. The DOJ Inspector General published a report in 2016 that documented systemic weaknesses in DOJ's ability to identify unregistered agents and enforce the statute. A 2021 follow-up found that improvements had been made but gaps remained. What the dataset shows, in other words, is not the complete map of foreign influence operations in the United States. It is the disclosed portion of that map—the agents who calculated that disclosure was preferable to the risk of prosecution, or who were told by counsel to register, or who were specifically prodded by a DOJ letter to do so.

That caveat does not make the dataset useless. It makes it a lower bound. Patterns in the data—which countries have the most registered agents, which DC firms register for the most principals, which sectors of foreign-government activity generate the most supplemental statement pages—are real patterns in disclosed foreign influence, and disclosed foreign influence is systematically undercounted. The network is larger than what is visible here.

Cross-reference opportunities

The FARA dataset is most useful as an input to a multi-dataset analysis rather than a standalone source. Four joins are particularly productive:

LDA lobbying disclosures

The Lobbying Disclosure Act database, maintained by the Senate Office of Public Records, covers domestic lobbying. Many DC firms that have FARA registrations also have LDA filings for domestic clients in the same reporting period. Joining on firm name (with normalization for common variants like “LLC” versus “LLP” suffixes) reveals the full scope of a firm's government relations practice—who they represent domestically, who they represent for foreign principals, and where those interests might overlap on specific legislation. A firm lobbying for a domestic defense contractor while also registered to represent a foreign government on defense-procurement policy is a configuration worth flagging.

OFAC SDN and consolidated sanctions lists

The foreign principals in the FARA dataset include governments and entities that appear on OFAC sanctions lists—not usually the principals themselves, since FARA registration for a sanctioned entity would be straightforwardly illegal in most cases, but entities in the same country programs or corporate families as sanctioned parties. Joining the ForeignPrincipals country codes and entity names against the OFAC SDN and Non-SDN consolidated sanctions lists identifies registrations that sit adjacent to the sanctions perimeter. This matters for compliance teams at banks and asset managers who need to map their counterparties' relationships.

Federal contracts

USASpending.gov publishes federal contract and grant awards with recipient names and CAGE codes. Some foreign principals that appear in FARA filings—particularly foreign state-owned enterprises in defense, energy, and technology sectors—also receive US government contracts through US-domiciled subsidiaries. The combination of “registered foreign agent represents this entity” and “US government awards contracts to this entity's subsidiaries” describes a counterintelligence-relevant configuration that neither dataset surfaces alone.

Congressional roll-call votes

FARA supplemental statements must describe activities in enough detail to be auditable. Many describe specific meetings with congressional offices, specific legislative vehicles discussed, and specific policy positions advanced. Extracting bill numbers and Congressional contacts from the supplemental statement PDFs using document parsing—the RegistrantDocs URLs point to retrievable files—and then joining against roll-call vote records from Congress.gov or ProPublica's Congress API reveals whether legislators who were contacted by registered foreign agents subsequently voted in alignment with the positions the agents were paid to advance. This is not evidence of impropriety on its own; legislators meet with many lobbyists. But systematic patterns across multiple registrants and multiple votes are a starting point for investigation.

Loading the dataset

The ingest pipeline for the FARA tables on the Federal Data Hub handles the Oracle APEX retrieval, the ISO-8859-1 decoding, the address-field comma normalization in older records, and the daily delta detection. The four tables are joined at query time on RegistrationNumber, which is the stable key across all four files. The RegistrantDocs table includes a DocType column that distinguishes registration statements from supplemental statements from amendments, so you can filter to the six-month activity reports without fetching every exhibit.

# All active FARA registrations with their foreign principals
GET https://api.ai-analytics.org/datasets/fara-registrations

# Registrants for a specific country
GET https://api.ai-analytics.org/datasets/fara-registrations?country=Saudi+Arabia

# All documents for a specific registrant
GET https://api.ai-analytics.org/datasets/fara-registrations/docs?registration_number=6926

# Cross-reference: registrants that also appear in LDA lobbying data
GET https://api.ai-analytics.org/datasets/fara-registrations?join=lda&country=China

The response schema follows the standard hub envelope: a data array of records, a meta block with total count and pagination cursor, and_source links pointing to the original DOJ documents for each record. The hub dataset page at https://api.ai-analytics.org/datasets/fara-registrations includes the full field reference, example queries, and a data dictionary derived from the column headers in the four DOJ ZIP files.

Practical notes

  • Registration numbers are not sequential by date. DOJ assigns registration numbers in the order applications are processed, not in the order the underlying activities began. A retroactive registration can have a higher registration number than a registration for activities that started years later. Use the registration date column, not the registration number, for time-series work.
  • Termination dates are not reliable absence indicators. A registration with no termination date is “active” in the DOJ system, but registrants sometimes simply stop filing supplemental statements without formally terminating. Check the date of the most recent supplemental statement in the RegistrantDocs file to assess whether a registration is genuinely ongoing.
  • Name normalization is essential before any join. The same DC firm appears under multiple name variants across the four files and across LDA filings: “Podesta Group Inc.” versus “The Podesta Group” versus “Podesta Group, Inc” A token-sorted normalized form plus suffix stripping handles most cases; phonetic normalization is needed for name changes after mergers.
  • The short-form file is a separate population. Short-form registrants do not appear in FARA_All_Registrants; they are in FARA_All_ShortForms only. Any analysis that sums total registered agents must include both files. The short-form population tends to be smaller individual practitioners rather than large DC firms.
  • Document parsing requires PDF extraction. The supplemental statement PDFs are the richest source of activity detail, but they are PDFs, often scanned for older filings. Text-layer PDFs from 2015 onward are parseable with standard PDF libraries. Filings from before 2010 are often image scans and require OCR. The most valuable fields—specific legislative contacts, bill numbers, dollar amounts paid for specific activities—are in the document body, not in the CSV metadata.

For the compliance screening endpoint that cross-references FARA principals against OFAC sanctions and 30+ federal enforcement lists in a single call: Compliance screening across 30+ federal enforcement lists: how the risk score works →

For how LDA lobbying disclosures are ingested and joined against FARA and federal contracts data: Lobbying Disclosure Act data: ingesting Senate SOPR bulk files and joining against FARA →

For the entity resolution pipeline that normalizes firm names across FARA, LDA, USASpending, and the 30+ enforcement lists: Building the cross-agency regulatory entity graph: 50M+ records, one join →