Technical writing

Follow the money: mapping dark money and super PAC flows with FEC bulk data

· AI Analytics
Regulatory dataFECCampaign financeSuper PACDark moneyPolitical money

The Federal Election Commission publishes, for free, a complete record of every itemized contribution and expenditure in federal elections going back to 1980. Individual donors, PAC transfers, super PAC receipts, independent expenditure campaigns, operating vendor payments—all of it is downloadable as CSV. The hard part is not obtaining the data. The hard part is understanding the structural layer that sits between the disclosed money and the actual money: the 501(c)(4) nonprofit conduit that the FEC cannot see into and that Congress has declined to illuminate.

The bulk data files

The FEC bulk data portal lives at fec.gov/data/browse-data/?tab=bulk-data. Data is organized by two-year election cycle. Files are numbered by cycle year suffix:24 for the 2023–2024 cycle, 22 for 2021–2022, and so on back to 80 for the 1979–1980 cycle. Within each cycle there are seven primary tables, each a pipe-delimited flat file inside a ZIP archive.

The most important files for campaign finance analysis are:

  • indiv{YY}.zip — Individual contributions of $200 or more, itemized. Fields include NAME, ZIP_CODE, EMPLOYER, OCCUPATION, TRANSACTION_AMT, TRANSACTION_DT, CMTE_ID (the receiving committee), and MEMO_TEXT. This is the primary source for donor-to-committee contribution tracing.
  • pas2{YY}.zip — Contributions from PACs and party committees to candidate committees (Schedule B filings from the giving committee, Schedule A from the receiving side). Contains CMTE_ID (giver), CAND_ID (candidate), and TRANSACTION_AMT. Use this to find which PACs funded which candidates.
  • itpas2{YY}.zip — PAC-to-PAC contributions. This is the critical file for dark money analysis. Money flowing from a 501(c)(4)-affiliated super PAC to another committee appears here. It is the table where shell layers are most visible.
  • oppexp{YY}.zip — Operating expenditures: every vendor payment made by committees. Useful for mapping which consulting firms, media buyers, and polling operations are doing business with which campaigns. The PAYEE_NAME field is not normalized.
  • cm.zip — Committee master file. Maps every CMTE_ID to a committee name, type code, designation code, party affiliation, treasurer name, and filing address. This file is the key to interpreting every other table.
  • ccl.zip — Committee-candidate linkage. Joins CMTE_ID to CAND_ID so you can trace which committees are formally affiliated with which candidates.

Committee type codes

The committee master file uses single-character type codes that determine how a committee is classified under federal law. These codes are essential for filtering the dataset to the committee category you are analyzing. The full taxonomy is:

  • C — Candidate committee (authorized by the candidate)
  • D — Delegate committee
  • E — Electioneering communication organization
  • H — House candidate committee
  • I — Independent expenditure (527 group)
  • N — Non-party independent expenditure committee
  • O — Super PAC (IE-only committee, the post-Citizens United vehicle)
  • P — Presidential candidate committee
  • Q — Non-connected PAC (not affiliated with any candidate or party)
  • S — Senate candidate committee
  • U — Single-candidate independent expenditure committee
  • V — PAC with nonfederal account (state/local active)
  • W — PAC with nonfederal account (Senate)
  • X — Non-party nonfederal committee (electioneering only)
  • Y — Non-party nonfederal Senate committee
  • Z — Non-party nonfederal presidential committee

For super PAC analysis, filter cm.zip on CMTE_TYPE = 'O'. For 527 groups, filter on CMTE_TYPE = 'I'. Non-connected PACs (Q) include many of the ideological committees that are legally distinct from candidate committees but operationally coordinated through shared vendors and personnel.

Super PACs versus dark money: the structural gap

The terminology in campaign finance reporting is frequently imprecise. The distinction matters legally and analytically.

A super PAC is an independent expenditure-only committee (committee typeO). Under the rules established by SpeechNow.org v. FEC (D.C. Circuit, 2010) and the FEC's subsequent advisory opinions, super PACs may raise unlimited contributions from corporations, unions, and individuals, but may not make direct contributions to candidates or coordinate with their campaigns. Super PACs must disclose all donors to the FEC on Schedule A. That donor list is public and searchable. If a hedge fund manager gives $5 million to a super PAC, that contribution is in indiv{YY}.zip.

Dark money refers to funds that influence elections but are never disclosed to the FEC because they flow through organizations not required to file with the commission. The primary vehicle is the 501(c)(4) social welfare organization. Under IRS rules, a 501(c)(4) may engage in political activity as long as that activity is not its “primary purpose”—a standard that has proven nearly impossible to enforce. A 501(c)(4) is not required to disclose its donors to the public under any federal law currently in force.

The shell-company layer works as follows. A 501(c)(4) raises money from donors who want political influence without public attribution. The 501(c)(4) transfers funds to a super PAC it controls or is aligned with. The super PAC's Schedule A then lists the 501(c)(4) organization as the donor—not the underlying individuals. The individual donors to the 501(c)(4) are invisible in FEC data. What appears in itpas2{YY}.zip is the transfer from the nonprofit to the super PAC, with the nonprofit's name as the contributor. You know a check arrived. You do not know who funded the check-writer.

Citizens United v. FEC (2010) removed the prohibition on independent corporate and union expenditures in federal elections. SpeechNow.org v. FEC (2010) extended this to individual contributions to IE-only groups. The combination created the legal infrastructure for both super PACs and the 501(c)(4) conduit layer. The IRS application backlog that followed the 2012 election cycle—during which the Tax Exempt Organizations office in Cincinnati created the internal BOLO lists that became a significant political controversy—had the practical effect of allowing dozens of 501(c)(4) organizations to operate without IRS determination letters for years, further cementing the dark money ecosystem.

Identifying shell-company donors in the individual file

The indiv{YY}.zip file includes EMPLOYER and OCCUPATION fields that donors provide on contribution forms. These fields are self-reported and not verified by the FEC. They are, however, analytically valuable for identifying suspicious donation patterns.

Several signals in the individual file suggest a donor may be acting as a shell or conduit:

  • EMPLOYER field contains an LLC with no public business presence. Cross-referencing the employer name against SEC EDGAR, state secretary of state filings, and LinkedIn reveals whether the entity has any operational employees. An LLC with one registered agent and no web presence that contributes $100,000 in a single cycle is a structuring signal.
  • Contributions clustered within 48 hours of each other from different donors at the same ZIP code. The FEC's own compliance literature describes “conduit” schemes in which a central actor reimburses individuals for making contributions that appear to be independent. The amount, timing, and geographic clustering of small tranches from the same employer/ZIP combination is a pattern worth flagging.
  • Occupation listed as “investor,” “self-employed,” or “homemaker” with contributions exceeding $50,000. These occupations are disproportionately used by donors who want to minimize employer attribution. They are not inherently suspicious, but the combination of vague occupation, large amount, and timing near a major election is worth examining.
  • Transactions timestamped in the last 10 days before an election filing deadline. FEC reporting deadlines create incentives to time contributions so they fall in a later reporting period. Last-minute large contributions from entities with opaque ownership structures are a consistent dark money signal.

The dark money gap: what FEC data cannot show

501(c)(4) organizations file IRS Form 990 annually, which is public under 26 U.S.C. § 6104. The 990 discloses total revenues, total political expenditures, and the five highest-paid employees. It does not disclose donor names for 501(c)(4) organizations (unlike 501(c)(3) public charities, which disclose major donors on Schedule B only to the IRS, not publicly). The practical consequence: you can see that a 501(c)(4) received $47 million in a given tax year, but you cannot determine from the 990 who contributed that $47 million.

The canonical research tool for dark money is the OpenSecrets Dark Money database, which cross-references FEC transfer data with IRS 990 filings using ProPublica's Nonprofit Explorer API and the NCCS data archive. The methodology is: identify every 501(c)(4) that appears as a contributor in itpas2{YY}.zip, retrieve their 990 filings, and map the total transfer against reported political expenditures. FollowTheMoney.org covers state-level data and provides some additional coverage for organizations that operate in state elections but not at the federal level.

The IRS 990 data is available in bulk from the IRS at irs.gov/charities-non-profits/form-990-series-downloads as XML files indexed by EIN. ProPublica mirrors and indexes these as JSON via their Nonprofit Explorer API. The relevant 990 fields for dark money mapping are: Part I Line 4 (political activity), Schedule C Part II (expenditures for public office), and Schedule C Part III (disclosure of public office expenditures by year). The total from Schedule C is the upper bound on FEC-visible transfers; 501(c)(4)s may also run issue ads that are not classified as independent expenditures under FEC rules, which do not appear in either the FEC data or Schedule C.

Entity resolution across election cycles

The FEC does not assign stable identifiers to individual human donors. The CMTE_ID for a committee is persistent—once assigned, it stays with that committee across cycles. But the individual donor records in indiv{YY}.zip have no unique person identifier. The same donor across multiple cycles must be identified by constructing a matching key from NAME, ZIP_CODE, EMPLOYER, and OCCUPATION. This is the “fuzzy donor” problem.

The standard approach combines exact and approximate matching in stages:

  1. Normalize NAME. Strip legal suffixes (Jr., Sr., III), expand common abbreviations (Wm. → William), and apply NFKC Unicode normalization. The FEC name field is last-name-first, comma-separated: SMITH, JOHN A. Parse accordingly before matching.
  2. ZIP+4 clustering. Use the five-digit ZIP as a blocking key. Donors with identical five-digit ZIPs who share a normalized name are strong candidates for the same person. The +4 extension is inconsistently provided and should not be used as a required match field.
  3. Jaro-Winkler name similarity. Within ZIP blocks, apply Jaro-Winkler string similarity at a threshold of 0.88 to catch name-field variation across cycles (typos, nickname expansion, middle initial presence/absence).
  4. EMPLOYER cross-validation. Where name similarity is ambiguous (Jaro-Winkler 0.82–0.88), use EMPLOYER agreement as a tiebreaker. Two records with the same normalized employer and matching ZIP are likely the same person even if the name has minor variation.

The OpenSecrets methodology for donor entity resolution is described in their data documentation and produces a contrib_id field that persists across cycles in their bulk downloads. For raw FEC data, you are building this yourself.

Building the candidate-PAC network graph

The most analytically productive structure for campaign finance data is a directed weighted graph where nodes are committee IDs and edges are contribution flows. This reveals community structure—clusters of committees that share money—and surfaces the hub organizations that distribute funds across many candidates.

Construction steps using Python and NetworkX:

import pandas as pd
import networkx as nx

# Load committee master
cm = pd.read_csv(
    'cm.zip', sep='|', header=None,
    names=['CMTE_ID','CMTE_NM','TRES_NM','CMTE_ST1','CMTE_ST2',
           'CMTE_CITY','CMTE_ST','CMTE_ZIP','CMTE_DSGN',
           'CMTE_TYPE','CMTE_PTY_AFFILIATION','CMTE_FILING_FREQ',
           'ORG_TP','CONNECTED_ORG_NM','CAND_ID'],
    encoding='latin-1'
)

# Isolate super PACs
super_pacs = cm[cm['CMTE_TYPE'] == 'O'][['CMTE_ID', 'CMTE_NM']]

# Load PAC-to-PAC contributions
pac_to_pac = pd.read_csv(
    'itpas224.zip', sep='|', header=None,
    names=['CMTE_ID','AMNDT_IND','RPT_TP','TRANSACTION_PGI',
           'IMAGE_NUM','TRANSACTION_TP','ENTITY_TP','NAME',
           'CITY','STATE','ZIP_CODE','EMPLOYER','OCCUPATION',
           'TRANSACTION_DT','TRANSACTION_AMT','OTHER_ID',
           'TRAN_ID','FILE_NUM','MEMO_CD','MEMO_TEXT','SUB_ID'],
    encoding='latin-1'
)

# Filter to flows >= $500k where sender is a super PAC
large_flows = pac_to_pac[
    (pac_to_pac['TRANSACTION_AMT'] >= 500_000) &
    (pac_to_pac['CMTE_ID'].isin(super_pacs['CMTE_ID']))
].copy()

# Build directed graph
G = nx.DiGraph()
for _, row in large_flows.iterrows():
    src = row['CMTE_ID']
    dst = row['OTHER_ID']  # receiving committee
    amt = row['TRANSACTION_AMT']
    if G.has_edge(src, dst):
        G[src][dst]['weight'] += amt
    else:
        G.add_edge(src, dst, weight=amt)

# Add committee name attributes
for nid in G.nodes():
    match = cm[cm['CMTE_ID'] == nid]['CMTE_NM']
    G.nodes[nid]['label'] = match.values[0] if len(match) else nid

total = large_flows['TRANSACTION_AMT'].sum()
print("Nodes:", G.number_of_nodes(), "Edges:", G.number_of_edges())
print("Total flow (USD):", int(total))

# Community detection on undirected projection
G_undirected = G.to_undirected()
communities = nx.community.louvain_communities(G_undirected, seed=42)
print("Communities detected:", len(communities))

The OTHER_ID field in the PAC-to-PAC file contains the receiving committee's CMTE_ID. Note that not all contributions have a populated OTHER_ID—contributions from individuals (in indiv{YY}.zip) use a different field structure. For PAC-to-candidate flows from pas2{YY}.zip, the receiving identifier is in theCAND_ID field, not OTHER_ID, and you will need to join throughccl.zip to find the candidate's authorized committee.

The Louvain community detection algorithm on the undirected projection of the money-flow graph reliably surfaces ideologically aligned committee clusters: Senate campaign committees plus affiliated super PACs plus allied 527 groups cluster together, separated from House members in competitive districts who pull from different donor networks. In a 2024-cycle analysis, the top-20 super PACs by total disbursement sorted cleanly into three communities corresponding to presidential races, Senate battlegrounds, and House majority organizations.

The OpenFEC REST API

The FEC also offers a REST API at api.open.fec.gov/v1/ that covers the current cycle in near real time and includes all historical cycles going back to 1979. The API is rate-limited to 1,000 requests per hour for registered API key holders (100 per hour anonymous). Register at api.open.fec.gov/developers/ to obtain an API key.

The three most useful OpenFEC endpoints for campaign finance mapping are:

  • /v1/schedules/schedule_a/ — Schedule A receipts. Filter bycommittee_id, contributor_type, min_amount,max_date. Returns contributor name, address, employer, occupation, amount, date. Supports contributor_type=committee to isolate PAC-to-PAC and nonprofit-to-PAC transfers.
  • /v1/schedules/schedule_b/ — Schedule B disbursements. Filter bycommittee_id and recipient_name. Covers operating expenditures, independent expenditures, and transfers out.
  • /v1/schedules/schedule_e/ — Schedule E independent expenditures. IE-only committees (super PACs) must report independent expenditures over $250. This endpoint returns the payee, the candidate supported or opposed, the communication type (TV, digital, mail), and the amount. The support_oppose_indicatorfield distinguishes pro-candidate from anti-candidate spending.
import requests

API_KEY = 'YOUR_API_KEY'
BASE = 'https://api.open.fec.gov/v1'

# All IE expenditures by super PACs opposing a candidate
params = {
    'api_key': API_KEY,
    'support_oppose_indicator': 'O',      # oppose
    'committee_type': 'O',               # super PAC
    'min_amount': 100_000,
    'election_full': True,
    'cycle': 2024,
    'sort': '-expenditure_amount',
    'per_page': 100,
    'page': 1,
}

r = requests.get(f'{BASE}/schedules/schedule_e/', params=params)
data = r.json()

for ie in data['results']:
    print(
        ie['committee']['name'],
        ie['candidate_name'],
        ie['expenditure_amount'],
        ie['expenditure_date']
    )

The API returns paginated JSON using last_indexes cursor pagination rather than page numbers for large result sets. For datasets above a few thousand records, switch to the bulk download files rather than the API to avoid pagination overhead.

Historical patterns: Citizens United and the super PAC explosion

The structural shift is visible in the data. In the 2008 election cycle, there were no IE-only super PACs (committee type O) in the committee master file—the category did not exist under that name. The 2010 cycle saw the first wave: Priorities USA Action, American Crossroads, and a handful of others registered immediately following the Citizens United decision in January 2010 and the SpeechNow ruling in March 2010. By the 2012 cycle, there were 1,310 super PACs registered with the FEC. By 2024, there were over 3,200.

The dollar amounts follow the same curve with a steeper slope. Total super PAC receipts in the 2012 cycle were approximately $828 million. In the 2020 cycle, they exceeded $3.4 billion. In 2024, preliminary FEC totals put combined super PAC spending above $4.5 billion in disclosed funds—and OpenSecrets estimates that 501(c)(4) pass-through dark money added at least another $1.5 billion that is not directly attributable in FEC data.

The 501(c)(4) contribution pattern is stable across cycles: the organizations that transfer the largest amounts to super PACs in the final 90 days of a cycle tend to be entities whose IRS 990 filings, when they appear 18–24 months later, show substantial revenues that materially exceed disclosed funding sources. This gap— between what the 990 reports as total revenue and what can be traced to individual identified sources—is the empirical signature of dark money in the FEC-IRS cross-reference.

The Federal Regulatory Data Hub endpoint

The Federal Regulatory Data Hub ingests the FEC committee master, the PAC-to-PAC file, and the individual contribution file for each cycle as part of its federal election dataset group. The hub's cross-agency entity bridge resolves committee IDs against SEC EDGAR company identifiers (for corporate-affiliated PACs), FinCEN BSA data (for financial institution PACs), and the OFAC SDN list (for sanction-exposure cross-reference).

# Super PACs with contributions from 501(c)(4) organizations in the 2024 cycle
GET https://api.ai-analytics.org/v1/datasets/fec-committees?committee_type=O&cycle=2024

# PAC-to-PAC flows above $1M for a specific super PAC
GET https://api.ai-analytics.org/v1/datasets/fec-contributions/pac-to-pac?recipient_cmte_id=C00828210&min_amount=1000000

# Resolve a committee to cross-agency entity data
GET https://api.ai-analytics.org/v1/entity/C00828210

The _source field in hub responses links back to the original FEC filing so you can verify against the raw Schedule A or B document. The hub does not transform or summarize the FEC data; it normalizes committee names and resolves entity identities across the cross-agency bridge while leaving transaction amounts and dates unchanged from the source filing.

Practical notes

  • TRANSACTION_DT is a string, not a date. In most cycles the format is MMDDYYYY. In some older cycles it isMM/DD/YYYY. Parse defensively with a try-except across both formats before any time-series work.
  • Amended filings create duplicates. When a committee amends a previous filing, the FEC includes both the original and the amendment in the bulk file. The AMNDT_IND field identifies the filing type: N = new, A = amendment, T = termination. For analysis, keep only the latest amendment for each transaction. The SUB_ID field is the unique transaction identifier across amendments.
  • Names are not normalized across files. A donor named “SMITH, JOHN” in the individual file and “JOHN SMITH” in a Schedule A attached to a committee filing are the same person but formatted differently. The FEC does not enforce a canonical name format. Run your own normalization before any cross-file join.
  • The 48-hour reports are not in the bulk download. In the final 20 days before an election, committees are required to report contributions over $1,000 within 48 hours. These filings are available on the FEC's ORCA system and via the OpenFEC API but are not reflected in the cycle-end bulk files until the quarterly or post-election report is filed. For real-time election monitoring, use the API; for historical analysis, use the bulk files.
  • The 501(c)(4) disclosure gap is structural, not a bug. Congress has explicitly declined to mandate 501(c)(4) donor disclosure through legislation (the DISCLOSE Act failed repeatedly in the Senate). The FEC has been unable to act through rulemaking due to internal deadlock. The gap is not an artifact of incomplete data collection; it is the intended operating state of the current legal framework.

For the STOCK Act congressional trading dataset — how legislators' financial disclosures can be joined against FEC contribution data to identify potential conflicts of interest between personal trades and donor relationships: Trading on the inside: using STOCK Act filings to track congressional stock transactions →

For the FARA foreign agent registration dataset — how foreign government lobbying intersects with super PAC donor networks through registered agents: Foreign agents in plain sight: mapping DC's hidden influence network with FARA data →