Technical writing

Senate LDA lobbying disclosures: mapping $4 billion in annual influence spending

· 10 min read· AI Analytics
Federal DataLobbyingTransparencyPolitics

Every quarter, thousands of Washington lobbying firms file reports with the Senate Office of Public Records disclosing exactly who they represent, which policy issues they worked, which specific bills they lobbied on, and how much money changed hands. The aggregate figure—roughly $4 billion per year in disclosed lobbying spending—understates the true scope of organized influence on federal policy, but the disclosed portion is detailed, machine-readable, and available via a well-documented REST API. Most analysts never go beyond the top-line totals. The filing-level data is far richer.

What the Lobbying Disclosure Act requires

The Lobbying Disclosure Act of 1995, substantially amended by the Honest Leadership and Open Government Act of 2007, requires any individual who qualifies as a “lobbyist” under the statute to register with the Senate Office of Public Records (SOPR) and file quarterly activity reports. The threshold for registration is specific: a person who makes more than one lobbying contact and spends at least 20 percent of their working time lobbying for a client during a three-month period must register. The firm that employs such a person must also register as a “registrant.”

Registration must occur within 45 days of the first lobbying contact or from when the 20-percent threshold is crossed, whichever comes first. The registration document—the LD-1—captures the registrant name and address, the client name and address, a description of the client's business, the general issue areas to be lobbied, and the names of all lobbyists who will work on the engagement. The lobbyist names field also requires disclosure of any “covered official” positions the lobbyist held in the previous two years: any federal executive branch position confirmed by the Senate, any Senior Executive Service position, any position in the Office of the Vice President, or any congressional staff position at the level of deputy assistant secretary or above. This revolving-door disclosure is one of the most analytically useful fields in the dataset.

The quarterly LD-2: what gets reported

After registration, the quarterly LD-2 filing is the primary unit of analysis in the LDA database. It is structured in two main sections. The first identifies the registrant, the client, and the reporting period. The second—the lobbying activities section— is a series of entries, each linking a general issue code to specific bills lobbied and a list of the chambers and executive agencies contacted.

The LDA specifies exactly 79 general issue codes covering every recognized domain of federal policy. The taxonomy runs from AGR (Agriculture) throughTAX (Taxation), with codes for HCR (Health Issues),DEF (Defense), ENV (Environment/Superfund/Waste Management),FIN (Financial Institutions/Investments/Securities), ENE (Energy/Nuclear),TRA (Trade), TEC (Science/Technology), IMM (Immigration), and MED (Medicare/Medicaid) among dozens of others. A single LD-2 can list multiple issue codes, each with its own set of specific bills and agency contacts. The specific bills field accepts any bill number in any congressional format and is free text; lobbyists list specific House and Senate bill numbers, and sometimes Senate amendments, appropriations vehicles, and draft regulatory documents.

The dollar amount disclosure sits at the filing level, not the issue-code level. Firms that earn lobbying income report estimated income in the appropriate rounding band: below $5,000 (reported as zero), $5,000–$10,000 (reported as $5,000), and above $10,000 (rounded to the nearest $10,000). Large corporate clients that retain lobbying firms on an expense basis report total lobbying expenses using the same $10,000 rounding rule. A corporation that spent $4,320,000 on lobbying in a quarter reports $4,320,000 in the expenses field after rounding. This granularity is sufficient for most analysis; it is not sufficient for auditing individual invoice amounts.

The filing also records the active or terminated status of the engagement. Termination filings—LD-2s with a “Termination” flag—close out a registration when the lobbying relationship ends. The combination of registration date, all LD-2 filing dates, and the termination flag gives a complete timeline for any client-registrant relationship.

The LD-203 campaign contribution reports

In addition to the quarterly LD-2, the 2007 HLOGA amendments created a semiannual reporting obligation: the LD-203. Filed twice a year (covering January–June and July–December), the LD-203 requires every registered lobbyist and every registrant to disclose all contributions made to federal candidates, political party committees, PACs, and presidential inaugural committees. The LD-203 also requires disclosure of any payments to entities that provide event space for, or pay for costs associated with, a congressional fundraiser.

The LD-203 is the crucial bridge between LDA lobbying disclosures and FEC campaign finance data. A lobbyist for a pharmaceutical trade association who contributed $5,000 to the Senate Finance Committee chair's campaign committee in a year when the committee was marking up drug-pricing legislation appears in both datasets: in the LD-203 as a lobbying-community contribution and in the FEC Schedule A filings as an individual or PAC contribution. Joining the two datasets on contributor name, employer, and election cycle is the methodological core of most journalistic investigations into pay-to-play patterns in federal lobbying.

The scale of disclosed spending

Total disclosed lobbying spending has ranged between $3.3 billion and $4.1 billion per year for most of the past decade. The peak was $3.73 billion in 2010, the year of the Affordable Care Act implementation debates and Dodd-Frank, when healthcare and financial services clients drove record spending. Recent years have settled into a stable $3.5–3.8 billion range.

The four sectors that consistently lead the annual totals are healthcare and pharmaceuticals, finance and insurance, defense and aerospace, and energy (oil, gas, electric utilities, and nuclear). Healthcare and pharma typically account for the largest single sector by reported spending; in recent cycles, the Pharmaceutical Research and Manufacturers of America (PhRMA) and individual large pharmaceutical manufacturers have filed some of the highest per-registrant expense figures. The defense sector is notable for the concentration of spending among a small number of contractors relative to the size of their federal revenue; per-dollar-of-revenue lobbying intensity is highest in defense.

The $4 billion figure is a lower bound for two structural reasons. First, the $5,000 de minimis threshold means that a substantial number of lobbying contacts that fall below the 20-percent-of-time requirement are never registered and never disclosed. Second, coalition lobbying—where a trade association lobbies on behalf of dozens of member companies—attributes spending to the association rather than disaggregating it to members. A technology company that contributes $2 million to a trade association that in turn lobbies on issues favorable to that company does not disclose that spending as lobbying expenditure in its own LD-2; the association files the LD-2.

How to access the data

The Senate SOPR publishes LDA data through two mechanisms: a REST API at lda.senate.gov/api/v1/ and bulk XML downloads at lda.senate.gov/system/public/. The API is unusually well-documented for a Senate system: it provides an OpenAPI schema, supports filtering by filing year, filing period, registrant name, client name, and issue code, and returns paginated JSON. No API key is required.

The primary API endpoints are:

  • /api/v1/filings/ — All LD-2 quarterly reports. Filter parameters include filing_year, filing_period (Q1, Q2, Q3, Q4, H1, H2, ANNUAL, and termination variants), registrant_name, client_name, and issue_code_id. Each result includes the full lobbying activities array with general issue codes, specific issues text, bill numbers, and houses/agencies contacted.
  • /api/v1/registrants/ — Registrant-level records with address, registration date, and a list of all associated client engagements. Useful for building the firm-level view of which lobbying shops work across which issue areas.
  • /api/v1/clients/ — Client records. The client name field is self-reported and not normalized; the same corporation may appear under multiple name variants across different registrants and different filings.
  • /api/v1/lobbyists/ — Individual lobbyist records, including the covered official history. This endpoint is the entry point for revolving-door analysis.
  • /api/v1/contributions/ — LD-203 campaign contribution reports. Filter by lobbyist name, registrant, contribution date range, and recipient name.

The bulk XML downloads are organized by year and filing type. Each year's XML file contains all LD-1 registrations and LD-2 quarterly reports filed in that calendar year. The XML schema is documented in the download directory. The XML files are substantially larger than what the API returns for equivalent queries (the 2024 annual XML is roughly 1.8 GB uncompressed) but are the appropriate format for full-corpus analysis rather than targeted queries.

LDA versus FARA: the boundary

The LDA and FARA occupy adjacent but distinct legal spaces. The LDA governs lobbying of the United States government on behalf of domestic clients and foreign commercial clients whose lobbying activities are not directed by a foreign government or political party. FARA governs activities undertaken on behalf of foreign principals—foreign governments, foreign political parties, and entities whose activities are directed or controlled by such principals.

The boundary between the two regimes is administered through a statutory exemption: Section 3(h) of FARA allows an agent whose foreign principal is not a foreign government or party, and whose activities meet the LDA lobbying definition, to satisfy their FARA obligation by registering under the LDA instead. This LDA exemption is what makes a DC firm's LDA filing covering, say, a foreign sovereign wealth fund's US investment interests legally compliant without a full FARA registration—provided the fund is not being directed by the foreign government itself.

The practical consequence for analysts: a registrant that appears in the LDA database with a client identified as a foreign commercial entity may be doing what a FARA registrant does, under an LDA filing. Cross-referencing LDA client country-of-origin fields against FARA registrant data identifies the population of foreign-interest lobbying that has been routed through the LDA exemption rather than the full FARA disclosure process. That population is systematically underdisclosed in journalism about foreign influence because FARA is the more recognized dataset.

Python snippet: pulling quarterly filings and aggregating by client and issue

The following script calls the Senate LDA REST API to retrieve all LD-2 filings for a given quarter and aggregates disclosed spending by registrant and by issue code. Dollar amounts are taken as-is from the income field (used by small firms reporting lobbying income) or the expenses field (used by large clients reporting their own lobbying expenditure). One or the other will be populated on a given filing, not both.

import requests
import collections

BASE = 'https://lda.senate.gov/api/v1'

# Fetch one page of quarterly LD-2 filings for a given year and quarter
def fetch_filings(year, quarter, page=1):
    params = {
        'filing_year': year,
        'filing_period': quarter,  # Q1 | Q2 | Q3 | Q4 | H1 | H2 | ANNUAL
        'page': page,
        'page_size': 25,
    }
    r = requests.get(f'{BASE}/filings/', params=params, timeout=30)
    r.raise_for_status()
    return r.json()

# Pull all pages for Q1 2025
def all_filings(year, quarter):
    page = 1
    while True:
        data = fetch_filings(year, quarter, page)
        yield from data['results']
        if not data['next']:
            break
        page += 1

# Aggregate disclosed lobbying amounts by client and issue code
client_totals = collections.defaultdict(float)
issue_totals = collections.defaultdict(float)

for filing in all_filings(2025, 'Q1'):
    client = filing.get('registrant', {}).get('name', 'Unknown')
    # income is reported by small firms; expenses by large clients
    amount_str = filing.get('income') or filing.get('expenses') or '0'
    # amounts arrive as strings like '5000.00'; may be None if not yet filed
    try:
        amount = float(amount_str)
    except (TypeError, ValueError):
        amount = 0.0
    client_totals[client] += amount
    for lob in filing.get('lobbying_activities', []):
        code = lob.get('general_issue_code', 'UNK')
        issue_totals[code] += amount

# Top 10 clients by self-reported spend this quarter
top_clients = sorted(client_totals.items(), key=lambda x: x[1], reverse=True)[:10]
for name, total in top_clients:
    print(f'{name}: {int(total):,}')

# Top issue codes by aggregated spend
top_issues = sorted(issue_totals.items(), key=lambda x: x[1], reverse=True)[:10]
for code, total in top_issues:
    print(f'{code}: {int(total):,}')

A few implementation notes for production use. The lobbying_activitiesarray on each filing is the granular record: each element has ageneral_issue_code, a specific_lobbying_issues text blob, and a foreign_entity_issues flag. The specific issues field is free text and contains bill numbers in a variety of formats—H.R. 4521,S. 1260, SA 2311—that require normalization before any legislative-history join. The API paginates at 25 results by default; setpage_size=100 to reduce round trips. A full Q1 corpus across all registrants runs to approximately 12,000 filings.

The revolving door in the lobbyist data

The lobbyist-level data from /api/v1/lobbyists/ includes acovered_official_position field that lists every qualifying federal post the lobbyist held in the two years preceding their first LDA registration. The field is free text, but the underlying data is structured well enough for pattern analysis: agency abbreviations and position titles follow recognizable conventions, and a regex- or NLP-based classifier can extract agency, branch (executive versus legislative), and approximate seniority level.

The pattern that emerges from this analysis is consistent across administrations. Lobbyists who previously held senior positions at a specific agency or committee are disproportionately concentrated among the registrants that lobby that agency or committee. Former Senate Finance Committee staff appear heavily among healthcare and tax registrants; former DOD and service branch officials appear among defense registrants; former EPA officials appear among energy and environmental registrants. This is both the expected consequence of expertise-driven hiring and the functional definition of the revolving door: the same knowledge and relationships that made a person valuable in government make them valuable as a lobbyist for the industries that government regulates.

The two-year cooling-off period mandated by HLOGA created a measurable timing pattern in the lobbyist data: registration dates cluster at the 24-month mark after departure from covered positions, as former officials wait out the minimum mandatory gap before formally registering as lobbyists. The actual influence brokering in that two-year window is not necessarily absent; it is simply not captured as lobbying in the LDA sense, because a former official who “advises” a client without making direct lobbying contacts is not required to register.

Connecting lobbying to legislative outcomes

The most analytically productive use of LDA data is connecting reported lobbying activity to measurable legislative outcomes. The methodology is multi-step and requires joining several datasets, but the core logic is straightforward.

First, extract bill numbers from the specific_lobbying_issues field for all LD-2 filings in a given congress. Normalize bill numbers to a canonical format (HR4521, S1260) and join against bill status data from Congress.gov or GovTrack. This produces a dataset of {bill, registrant, client, issue code, amount} tuples covering every disclosed lobbying effort on every identified piece of legislation.

Second, retrieve committee referral and markup data for the bills that advanced out of committee. The lobbyist-to-bill matrix can then be joined against committee membership data: bills marked up by committees where members received LD-203 contributions from lobbyists who lobbied that bill are the population of interest for investigating whether lobbying spending correlates with committee attention.

Third, for bills that passed a chamber vote, retrieve the roll-call record. Join the per-member vote against the per-member LD-203 contribution receipt in the same session. The question is whether members who received contributions from lobbyists disclosing activity on a bill voted in alignment with those lobbyists' clients' stated position more often than members who received no such contributions. The methodological standard for such an analysis requires controlling for party membership, committee assignment, and district characteristics—vote-buying and vote-buying-shaped contribution targeting are both consistent with the observed correlation, and distinguishing them requires a structural model, not just a correlation coefficient.

Journalists have applied this framework most successfully to regulatory rulemakings rather than floor votes, because regulatory comment periods have clearer timing—the comment deadline creates a natural cutoff for which lobbying registrations preceded the rulemaking—and because agency rule text changes between proposed and final rules can be compared directly against positions stated in lobbying registrations and comment letters. The EPA Clean Air Act rulemakings, the CMS drug-pricing rules under the Inflation Reduction Act, and FCC broadband classification proceedings have all been analyzed using this LDA-to-rulemaking methodology, producing specific findings about which industry positions were incorporated into final rule text and how much lobbying spending preceded those incorporations.

Practical notes

  • Client names are not normalized. The same corporation appears under “Pfizer Inc.,” “Pfizer, Inc.,” and “Pfizer” across different registrant filings. Any client-level aggregation requires a normalization pass: strip legal suffixes, standardize capitalization, and apply fuzzy matching to consolidate variants. The Senate API does not provide a canonical client identifier across registrants.
  • Dollar amounts carry known biases at the rounding boundaries. The $10,000 rounding band means that actual spending of $14,999 and $15,000 are reported identically ($10,000 and $20,000 respectively). Spending distributions derived from LDA data show artificial clustering at multiples of $10,000. Do not interpret the distribution as reflecting true spending granularity; use it only for order-of-magnitude comparisons.
  • Terminated registrations remain in the dataset. LD-2 filings with filing_type=TERMINATION close out an engagement but the registrant and client records persist. Filter on filing type or on the registrant's active/terminated status field when you want only ongoing engagements.
  • The specific issues text is the richest but least structured field. Bill numbers, agency proceedings, regulatory docket numbers, and descriptive text all appear in a single free-text blob. A basic NLP pipeline extracting bill number patterns ([HS]\.?\s?[RJ]\.?\s?\d+ and variants), agency names from a controlled vocabulary, and CFR part numbers captures the majority of structured references without requiring full parsing.
  • Coalition lobbying understates per-company spending. A trade association LD-2 lists the association as the registrant and the client. Individual member companies' contributions to association lobbying budgets do not appear anywhere in the LDA. The LDA figure for, say, the American Bankers Association represents ABA's own spending; it does not capture the sum of member bank lobbying budgets that fund the association.

For the FARA dataset that covers lobbying on behalf of foreign governments and foreign principals—the adjacent registration system and how it relates to LDA exemptions: Foreign agents in plain sight: mapping DC's hidden influence network with FARA data →

For the FEC campaign finance dataset and how LD-203 lobbying contribution reports join against Schedule A receipts to map pay-to-play patterns: Follow the money: mapping dark money and super PAC flows with FEC bulk data →

For the STOCK Act congressional trading disclosures and how they combine with LD-203 contribution data to surface conflicts of interest around committee assignments: Trading on the inside: using STOCK Act filings to track congressional stock transactions →