Technical writing
Dark money disclosed: using IRS Form 990 data to map political organization spending
The IRS publishes Form 990 filings for over 65,000 tax-exempt political organizations, covering revenues, expenditures, officer compensation, and political activity schedules. For 527 political committees, the 990 sits alongside FEC disclosures and provides a cross-reference. For 501(c)(4) social welfare organizations—the dark money vehicle—the 990 is often the only structured public window into organizations that collectively move billions of dollars into US elections without disclosing a single donor name.
The two-tier structure: 527s and 501(c)(4)s
Political organizations that engage in campaign activity are governed by two distinct sections of the Internal Revenue Code, each with different disclosure obligations and different relationships to the FEC.
A 527 political organization is an entity organized and operated primarily for the purpose of influencing the selection, nomination, election, or appointment of individuals to public office. Section 527 organizations must file IRS Form 8871 (notice of formation) and Form 8872 (periodic report of contributions and expenditures) with the IRS. If a 527 makes expenditures in connection with federal elections, it is also subject to FEC jurisdiction and must file as a political committee under the Federal Election Campaign Act. The IRS 8872 filings and the FEC data overlap for federally active 527s but diverge for organizations that operate only in state-level elections: those groups file with the IRS but not the FEC. The gap between “527s that file only with the IRS” and “527s that also file with the FEC” is a material source of incomplete disclosure in political finance data.
A 501(c)(4) social welfare organization is a fundamentally different animal. Its primary purpose must be promotion of community welfare—not political activity. The IRS has historically interpreted “primary purpose” to mean that political activity must constitute less than half of the organization's total activities, producing the informal 49% threshold that political operatives use as a ceiling. A 501(c)(4) files Form 990 annually with the IRS, which is public. It is not required to register with the FEC unless it crosses the threshold for being treated as a political committee under FECA—a threshold most 501(c)(4)s carefully avoid by running issue ads rather than express advocacy. Crucially, a 501(c)(4) is not required to publicly disclose its donors under any currently operative federal law. The Form 990 Schedule B, which lists major donors, is filed with the IRS but is redacted from the public version for 501(c)(4) organizations.
Why 501(c)(4)s are the dark money vehicle
The legal infrastructure enabling the dark money flow through 501(c)(4)s was assembled in stages. The Supreme Court's 2010 decision in Citizens United v. FEC established that corporations and nonprofits have a First Amendment right to make independent expenditures in federal elections. The D.C. Circuit's 2010 decision in SpeechNow.org v. FEC extended this to individual contributions to independent expenditure-only committees, creating the legal basis for super PACs. Together these decisions created a structure where a 501(c)(4) can raise unlimited contributions from donors who want no public attribution, then transfer those funds to a super PAC that is legally required to disclose the 501(c)(4) as a donor—but not the individuals behind the 501(c)(4).
The result is a layered disclosure gap. The super PAC's FEC Schedule A shows a transfer from, for example, “Citizens for a Better America.” The IRS 990 for Citizens for a Better America shows total revenues of $47 million and political expenditures of $44 million. Neither document names the people who funded the $47 million. The IRS Schedule B that would name major contributors is withheld from the public copy. This is not a failure of data collection; it is the structural consequence of how Congress has written the disclosure laws and how the IRS has interpreted its statutory authority over tax-exempt organizations.
What Form 990 contains for political organizations
Form 990 is a 12-part annual information return that tax-exempt organizations with gross receipts above $200,000 (or total assets above $500,000) must file with the IRS. Smaller organizations file Form 990-EZ; the very smallest (under $50,000 gross receipts) file only the 990-N e-Postcard. For political organizations of meaningful scale, the full 990 is the relevant document. Its key sections for political finance analysis are:
- Part I (Summary). Total revenues, total expenses, net assets at beginning and end of year, total number of employees and volunteers. This is the top-level financial summary. Line 4 asks whether the organization engaged in political campaign activities. A “Yes” on Line 4 triggers Schedule C.
- Part IV (Checklist of Required Schedules). A 38-question checklist that determines which schedules must be attached. Questions relevant to political organizations include Line 3 (political campaign activities, Schedule C), Line 4 (lobbying, Schedule C), and Line 17 (foreign financial accounts). The checklist is the fastest way to identify which supplemental disclosures an organization is required to make.
- Part VII (Compensation of Officers, Directors, Trustees, Key Employees, Highest Compensated Employees, and Independent Contractors). Section A lists every officer, director, trustee, and key employee by name, title, average hours per week, and total compensation (base, bonus, deferred, nontaxable benefits). Section B lists the five highest-compensated independent contractors. The compensation data for major political nonprofits frequently reveals significant executive pay relative to the organization's stated social welfare mission.
- Schedule C (Political Campaign and Lobbying Activities). Part I covers 501(c)(3) organizations that may not engage in political campaign activity (not relevant here). Part II covers 501(c) organizations other than 501(c)(3)s—i.e., 501(c)(4)s—and requires disclosure of total direct and indirect political expenditures, the amount of political expenditures that were paid from exempt function income, and a description of the activities. Part III requires organizations that made direct or indirect political contributions or expenditures to describe them. Schedule C is the primary source for political expenditure totals in 990 data.
- Schedule O (Supplemental Information to Form 990 or 990-EZ). Schedule O is a free-form narrative attachment where organizations explain anything that cannot be captured in the main return's structured fields. For political organizations, Schedule O is where the explanations of “primary purpose” under the social welfare doctrine live. An organization arguing that its 48% political activity still qualifies as “not primarily” political will make that argument in Schedule O. It is also where unusual revenue streams, related-party transactions, and governance matters that triggered checklist questions get explained. Schedule O is the narrative layer of the 990 and the most analytically rich section for investigative research.
The IRS bulk XML release
The IRS releases all electronically filed 990s as XML in an AWS S3 bucket. This is the canonical machine-readable source for large-scale 990 analysis. The bucket is named irs-form990 and is publicly accessible without authentication. Index files by year map each filing to an S3 object path.
# Download the index for filings submitted in 2023
curl -O https://s3.amazonaws.com/irs-form990/index_2023.json
# Structure of each index entry:
# {
# "EIN": "123456789",
# "TaxPeriod": "202212",
# "DLN": "...",
# "FormType": "990",
# "URL": "https://s3.amazonaws.com/irs-form990/202301319042200016_public.xml",
# "OrganizationName": "CITIZENS FOR A BETTER AMERICA",
# "SubmittedOn": "2023-11-15",
# "ObjectId": "202301319042200016",
# "LastUpdated": "2023-11-16T08:00:00",
# "IsElectronic": true,
# "IsAvailable": true
# }
# Download the XML for a specific filing
curl -O https://s3.amazonaws.com/irs-form990/202301319042200016_public.xmlThe XML schema varies by form type. The IRS publishes formal XSD schemas at irs.gov/charities-non-profits/current-valid-xml-schemas-and-business-rules. The three schemas you will encounter are:
- IRS990 — Full Form 990. The most complete disclosure, used by organizations with gross receipts ≥ $200,000 or total assets ≥ $500,000. The XML element for total political expenditures is
PoliticalCampaignActvtsAmtin the Schedule C portion of the return. - IRS990EZ — Form 990-EZ. Used by smaller organizations. The political activity fields are present but less granular; Schedule C is replaced by Part V of the EZ form.
- IRS990PF — Form 990-PF. Used by private foundations. 501(c)(4) political organizations do not use this form. If you encounter a 990-PF for an entity in your political organization analysis, the EIN is likely misclassified in your working dataset.
Parsing the bulk XML at scale requires handling schema version changes across years. The IRS has updated the 990 XML schema multiple times since the 2012 e-file mandate; element names for the same concept have changed between versions. The stable approach is to key on the schema version attribute in the root element and maintain a mapping from schema version to field path for each variable you are extracting.
import boto3
import json
import xml.etree.ElementTree as ET
s3 = boto3.client('s3', config=boto3.session.Config(signature_version='UNSIGNED'))
# Load the index
index_obj = s3.get_object(Bucket='irs-form990', Key='index_2023.json')
index = json.loads(index_obj['Body'].read())
# Political activity field path varies by schema version
POLITICAL_EXPENDITURE_PATHS = {
'2013v3.0': './/PoliticalCampaignActyvtsAmt', # note typo in early schemas
'2014v5.0': './/PoliticalCampaignActvtsAmt',
'2015v2.0': './/PoliticalCampaignActvtsAmt',
'2016v3.0': './/PoliticalCampaignActvtsAmt',
'2022v5.0': './/PoliticalCampaignActvtsAmt',
}
def extract_political_expenditure(xml_bytes):
root = ET.fromstring(xml_bytes)
ns = root.tag.split('}')[0].lstrip('{') if '}' in root.tag else ''
version = root.attrib.get('returnVersion', 'unknown')
path = POLITICAL_EXPENDITURE_PATHS.get(version, './/PoliticalCampaignActvtsAmt')
elem = root.find(path)
return int(elem.text) if elem is not None and elem.text else 0ProPublica Nonprofit Explorer
ProPublica's Nonprofit Explorer provides a pre-parsed API over the IRS bulk XML data, making it practical to query individual organizations without building a full XML parsing pipeline. The API is available at https://projects.propublica.org/nonprofits/api/v2/ and requires no authentication for standard queries.
The three primary endpoints for political organization research are:
/search.json?q=QUERY&ntee=W— Organization search. NTEE code W30 covers civic, social, and community organizations frequently used by 501(c)(4) political vehicles. Theqparameter searches organization name and EIN. Results includeein,name,state,ntee_code, andupdated(last 990 filing year)./organizations/{EIN}.json— Organization detail. Returns all 990 filings for an EIN, each with extracted financial metrics:totrevenue,totexpenditures,totnetassetend,politicalactivities(the Schedule C total), andtopcompensation(officer pay from Part VII). Thefilings_with_dataarray is the list of parsed filings sorted descending by tax period./organizations/{EIN}/filings.json— Filing history with direct download links to the IRS XML for each year. Use this to obtain the raw XML when you need fields not extracted by ProPublica's parser (Schedule O narrative, for example, is not in the API response).
import requests
BASE = 'https://projects.propublica.org/nonprofits/api/v2'
def get_990_filings(ein: str) -> list[dict]:
"""Return all parsed 990 filings for an EIN."""
r = requests.get(f'{BASE}/organizations/{ein}.json', timeout=30)
r.raise_for_status()
org = r.json()['organization']
filings = r.json().get('filings_with_data', [])
return [
{
'ein': ein,
'name': org['name'],
'tax_period': f['tax_prd_yr'],
'total_revenue': f.get('totrevenue', 0),
'total_expenses': f.get('totexpenditures', 0),
'political_expenditures': f.get('politicalactivities', 0),
'officer_compensation': f.get('topcompensation', 0),
}
for f in filings
]
# Search for organizations matching a query
def search_nonprofits(query: str, page: int = 0) -> list[dict]:
r = requests.get(f'{BASE}/search.json', params={'q': query, 'page': page}, timeout=30)
r.raise_for_status()
return r.json().get('organizations', [])Three research use cases
The combination of IRS bulk XML and the ProPublica API enables several classes of political finance research that are not possible from FEC data alone.
Use case 1: 501(c)(4)s where political activity exceeds social welfare activity. Schedule C Part II requires organizations to report total direct and indirect political expenditures. Part I of the 990 reports total expenses. The ratio of Schedule C political expenditures to total expenses is a direct test of the “primary purpose” doctrine. Organizations where this ratio exceeds 49% are technically operating outside IRS guidance. In practice, the IRS has rarely revoked 501(c)(4) status for political activity excess—the 2013 BOLO controversy effectively froze enforcement for years—but the ratio remains analytically meaningful for identifying organizations operating near or above the limit.
# Find 501(c)(4)s with high political-to-total-expense ratios
threshold = 0.40 # flag at 40%, material below the 49% ceiling
for ein in target_eins:
filings = get_990_filings(ein)
for f in filings:
if f['total_expenses'] > 0:
ratio = f['political_expenditures'] / f['total_expenses']
if ratio >= threshold:
print(f"{f['name']} {f['tax_period']} {ratio:.1%}")Use case 2: Executive compensation at major political nonprofits. Part VII compensation disclosures for 501(c)(4) political organizations frequently show total compensation packages exceeding $500,000 per year for executive directors of organizations whose primary visible output is political advertising. The analytic question is whether the compensation is proportional to the organization's stated social welfare activities or primarily a function of its political fundraising volume. Cross-referencing Part VII compensation with Schedule C political expenditure totals produces a “compensation-to-political-spending” ratio that identifies the organizations where administrative overhead is unusually high relative to actual political activity. These are candidates for pass-through analysis.
Use case 3: Revenue-to-political-spending ratio and pass-through vehicles. A pass-through 501(c)(4) receives donations, retains a small administrative fraction, and transfers the remainder to affiliated super PACs or to direct political expenditures. Its revenue-to-political-spending ratio approaches 1.0. Organizations with Part I total revenues above $10 million and Schedule C political expenditures exceeding 80% of revenue, combined with Part VII total compensation below $200,000, are the clearest examples of pass-through vehicles: most of the money moves through, very little goes to operations, and the compensation structure is minimal. These organizations are optimized for moving dark money, not running social welfare programs.
The 527 database: IRS Form 8872 and the FEC gap
Section 527 political organizations that are not registered as federal political committees under FECA must file Form 8872 with the IRS. The 8872 is a periodic report covering contributions received and expenditures made. The IRS publishes 8872 data in a searchable database at forms.irs.gov/app/pod/basicSearch/search and in bulk CSV download.
The analytical gap between 527s that file only with the IRS and those that also file with the FEC is a persistent data integrity problem. A 527 that makes expenditures in connection with state elections but not federal elections files 8872 with the IRS. A 527 that crosses into federal political activity must register with the FEC as a political committee. The migration between IRS-only and FEC-registered status is not always clean: some organizations make expenditures in both state and federal elections and file with both; others make federal-adjacent expenditures (ads referencing federal candidates without “express advocacy” language) while asserting they are state-only. The 8872 contributions list names individual contributors—unlike the 990 Schedule B for 501(c)(4)s—making it the more useful disclosure document for 527s, but only for the subset of 527 activity that is channeled through organizations below the FEC threshold.
Combining the IRS 8872 data with FEC data to get a complete picture of political committee spending requires:
- Download the IRS 8872 bulk CSV and extract organization names, EINs, total contributions, and total expenditures by filing period.
- Match EINs against the FEC committee master file to identify 527s that are also registered FEC political committees. These organizations have FEC CMTE_IDs that can be joined against the FEC bulk data for the detailed transaction-level disclosure.
- For EINs present in the 8872 data but absent from the FEC committee master, the 8872 is the only federal disclosure. Extract individual contributor names and amounts from the 8872 contributions schedule.
- Aggregate by EIN across both sources to compute total political receipts (8872 contributions + FEC Schedule A for dual-filers) and total political expenditures (8872 expenditures + FEC Schedule B and E).
Cross-references to other datasets
Form 990 data becomes most analytically powerful when cross-referenced against complementary federal datasets.
FEC individual contributions and independent expenditures. For 527s with FEC filings, the 990 provides annual aggregate financial data while the FEC Schedule A provides transaction-level individual contributor detail (for contributions over $200) and Schedule E provides individual independent expenditure records. The 990 and FEC data cover different time resolutions: the 990 is annual (matching the tax year), the FEC is reported on a quarterly or election-cycle-triggered basis. Building a combined time series requires aligning the two reporting calendars.
FARA (Foreign Agents Registration Act). Political nonprofits that receive funds from or act under the direction of foreign principals are required to register under FARA. The DOJ publishes a FARA database covering registered agents, principals, and supplemental statements (quarterly financial disclosures). Cross-referencing 990 filer EINs against the FARA registrant database—by organization name and state of incorporation where EINs do not match directly—identifies the small subset of political nonprofits with disclosed foreign connections. The FARA supplemental statements include financial data (disbursements on behalf of the foreign principal) that can be compared against the 990 political expenditure totals.
SEC proxy filings (DEF 14A) and 8-K disclosures. Publicly traded corporations are not required to disclose political contributions in SEC filings, but many voluntarily include corporate political activity in proxy statement governance disclosures. Cross-referencing SEC EDGAR filer CIK numbers against the FARA database and FEC committee affiliation data (via the CONNECTED_ORG_NM field in the FEC committee master) identifies corporations with disclosed relationships to political committees. The resulting dataset maps the corporate-to-nonprofit-to-super-PAC funding chain from the corporate disclosure side.
Limitations
Four structural constraints bound what 990-based political finance research can reveal:
- The dark money gap is architectural. The most critical information—who funded the 501(c)(4)—is legally withheld from the public 990. Schedule B major donor disclosure is submitted to the IRS but redacted from the copy released under 26 U.S.C. § 6104. No current federal law requires 501(c)(4) donor disclosure. The 990 tells you the money moved; it does not tell you who put the money in.
- The 2012 e-file cliff. The IRS bulk XML release covers electronically filed returns since 2012, when e-filing became mandatory for larger organizations. Pre-2012 990s exist as paper filings and scanned PDFs. GuideStar (now Candid) has digitized many historical returns, but the structured data coverage drops sharply before 2012. For organizations formed before 2012, you may have a decade or more of activity with no machine-readable 990 data.
- Late and amended filings distort time series. The IRS grants automatic six-month extensions, so a 990 for the tax year ending December 2023 may not appear in the bulk data until mid-2025. Organizations in financial difficulty or legal dispute may file amended returns years after the original. The ProPublica API reflects the most recently filed version for each tax period, but building a consistent time series requires tracking amendment dates across the S3 index files.
- IRS delays in releasing bulk data. The IRS updates the S3 index files roughly monthly, but there are documented gaps of several months between a filing's acceptance date and its appearance in the bulk release. For research on organizations that filed recently—within the past 6 to 12 months—the ProPublica API or the IRS Tax Exempt Organization Search tool at apps.irs.gov/app/eos/ may provide coverage that has not yet reached the bulk S3 release.
For the FEC bulk data that shows the other side of the same money flow—how 501(c)(4) transfers appear as super PAC contributions and how to trace the donor-to-expenditure chain in FEC Schedule A and B files: Follow the money: mapping dark money and super PAC flows with FEC bulk data →
For the FARA dataset—how to cross-reference 990 filers against DOJ foreign agent registrations to identify political nonprofits with disclosed foreign principal relationships: Foreign agents in plain sight: mapping DC's hidden influence network with FARA data →
For the STOCK Act congressional trading dataset—how to join legislative financial disclosures against FEC contribution data and 990 officer compensation records to surface potential conflicts of interest: Trading on the inside: using STOCK Act filings to track congressional stock transactions →