Technical writing

Trading on the inside: using STOCK Act filings to track congressional stock transactions

· 11 min read· AI Analytics
Regulatory dataSTOCK ActCongressTradingConflicts of interestDisclosure

In February 2020, before markets had registered the full weight of what was coming, several members of Congress sold millions of dollars in stock. Senator Richard Burr, chair of the Senate Intelligence Committee, disposed of between $628,000 and $1.72 million in shares across 33 transactions over a single week. Senator Kelly Loeffler sold between $1.28 million and $3.1 million over the same period. Both had attended classified briefings on the emerging COVID-19 threat. Both filed the disclosures the STOCK Act required—weeks later, in scanned PDFs, buried inside a government website that most Americans have never heard of.

The Stop Trading on Congressional Knowledge Act—signed by President Obama in April 2012 after passing with rare bipartisan consensus—did not prohibit members of Congress from trading stocks. It required them to disclose those trades. The distinction is not subtle. What the law created is a paper trail: a disclosure regime that, if you can extract structured data from the scanned PDFs the House Clerk publishes, produces a real-time feed of congressional portfolio activity that can be cross-referenced against legislation, committee assignments, and classified briefing schedules.

This post covers the statutory mechanics of STOCK Act disclosure, where the raw filings live, how researchers and data vendors have built structured datasets from them, what the key fields look like, and what patterns the data reveals when you join it against public records on legislation and committee membership.

What the STOCK Act requires

The Stop Trading on Congressional Knowledge Act is codified primarily at 5 U.S.C. §§ 13101–13111 (formerly the STOCK Act of 2012, Pub. L. 112-105, 126 Stat. 291). The core disclosure obligation sits at 5 U.S.C. § 13104. It applies to:

  • Members of Congress — all 535 voting members of the Senate and House. Non-voting delegates and the Resident Commissioner of Puerto Rico are also covered.
  • Senior executive branch officials — the President, Vice President, and senior officials at GS-15 and above or SES level.
  • Congressional staff above an income threshold — employees earning more than 75% of a member's base pay must file if they have access to non-public information. This threshold covers roughly 1,800 to 2,000 senior congressional staffers at any given time.

The disclosable transactions are: purchases, sales, and exchanges of securities with a value exceeding $1,000. “Securities” under the Act covers stocks, bonds, commodity futures, and options. Real estate transactions are not covered. Transactions in Treasury securities and securities of entities regulated by the filer (a limited carve-out that has been criticized as narrow in practice) are exempt. Transactions by a dependent child are covered; transactions by a spouse are covered in most circumstances.

The deadline is 30 days after the member knew or should have known of a transaction, or 45 days after the transaction occurred—whichever comes first. In practice, the 45-day window is the governing limit for most filings. A transaction occurring on January 1 must be reported by February 15. The fine for late filing: $200. That figure has not changed since the law was enacted in 2012, and it has not been indexed to inflation.

Where the raw disclosures live

The House and Senate handle disclosures differently, and that difference has major implications for anyone trying to build structured data from them.

The House Clerk publishes periodic transaction reports (PTRs) at:

https://disclosures.house.gov/FinancialDisclosure

The House disclosure system is organized by year. Each year's filings are indexed in an XML file listing all reports filed that year. The individual disclosures themselves are scanned PDFs. The PDF quality varies from crisp digital scans to barely legible photocopies of faxed forms. There is no machine-readable structured export. The House Clerk provides a search interface, but it returns PDF links, not data. If you want a table of every transaction a given member reported, you need to download the PDFs and extract them, either via OCR or via one of the structured datasets maintained by third parties.

The Senate Electronic Financial Disclosure system (eFDS), accessible at:

https://efds.senate.gov/

has required electronic filing since 2012 and provides a more usable search interface. Senate PTRs are available as HTML-formatted reports that can be scraped without OCR, and for some periods the Senate system provides XML exports. The Senate data is structurally cleaner, but the search interface is rate-limited and the XML export functionality is inconsistent across filing years. The Senate also publishes an annual financial disclosure report (FD) for each member that covers full-year holdings—a different instrument from the PTR that captures all positions held as of December 31, not just in-period transactions.

The structured data problem

Because House PTRs are scanned PDFs and the Senate system does not provide a clean bulk export, anyone wanting to do systematic analysis of congressional trading has historically faced a structured-data problem. Several organizations have solved it in different ways.

Quiver Quantitative maintains a congressional trading API at https://api.quiverquant.com/beta/historical/congresstrading/ that covers House and Senate PTRs going back to 2019. The API is paid (tiered pricing with a limited free tier). Quiver extracts transactions from both the Senate's HTML filings and the House's PDFs using a combination of parsing and OCR, normalizes ticker symbols, and enriches the output with market data. The historical endpoint accepts a ticker symbol and returns all congressional transactions in that security. A separate endpoint returns all transactions by a specific member by bioguide ID.

Capitol Trades at capitoltrades.com is a web-based aggregator that displays congressional trades in a browsable format with filtering by party, chamber, asset type, and date range. Capitol Trades also offers a CSV export function for filtered result sets, making it useful for researchers who need structured data without building a parser. The site covers both chambers and updates within 24–48 hours of new filings appearing in the House and Senate systems. The underlying methodology uses web scraping of the Senate eFDS and OCR-based extraction from House PDFs.

ProPublica Congress API provides member-level data (bioguide IDs, committee assignments, vote records) but does not itself surface STOCK Act trading data. It is valuable as a join target: a member's bioguide ID from ProPublica can be used to link trading records from Quiver or Capitol Trades against committee membership, vote history, and legislative sponsorship data, all of which ProPublica makes available in clean JSON.

The New York Times, Business Insider, and other newsrooms have built one-off structured datasets from STOCK Act filings for specific investigations. The Times published a dataset covering 2020–2023 for its analysis of congressional trading. These datasets are typically not maintained and go stale, but they provide verification anchors for checking whether a given scraper or API is returning accurate transaction records.

Key fields in a structured STOCK Act dataset

Once extracted from PDFs and normalized, a STOCK Act transaction record contains the following fields. Not all sources provide all fields; asset type and option-specific metadata are frequently missing from OCR-based extraction.

  • legislator_id — bioguide ID (e.g., B001135 for Richard Burr). Stable across sessions and chambers. The authoritative source is https://bioguide.congress.gov/.
  • transaction_date — the date the trade was executed, as reported by the member. This is the primary date for event studies. It is sometimes a date range (e.g., “2020-02-10 to 2020-02-14”) when a member batches transactions.
  • filing_date — the date the PTR was submitted to the House Clerk or Senate eFDS. The lag between transaction_date and filing_date is the disclosure delay. The maximum permitted is 45 days; violations are defined as any filing where this exceeds 45 days.
  • asset_name — the name of the security as reported by the member. Quality is variable: some members report “Apple Inc.”, others report “AAPL”, others report partial names or misspellings. Ticker normalization against a reference dataset (e.g., Quiver's enriched output or OpenFIGI) is required before systematic analysis.
  • ticker — the exchange ticker symbol, normalized. May be missing for bonds, commodity futures, or options where no ticker is standard.
  • asset_type — one of: stock, bond, option, cryptocurrency, exchange-traded fund, or other. Members are required to specify. The majority of disclosed transactions are in common stock or ETFs.
  • transaction_type — purchase, sale, sale (partial), or exchange. Options add exercise and assignment. Most analytics treat partial sales as sales for event-study purposes.
  • amount_range — the value bracket of the transaction. The STOCK Act does not require disclosure of exact amounts; it requires disclosure within ranges:
    • $1,001 – $15,000
    • $15,001 – $50,000
    • $50,001 – $100,000
    • $100,001 – $250,000
    • $250,001 – $500,000
    • $500,001 – $1,000,000
    • $1,000,001 – $5,000,000
    • Over $5,000,000

    The range structure means aggregate portfolio value calculations require assumptions. The midpoint of each bucket is the standard convention; median or lower-bound are used for conservative estimates. For the top bracket, no upper bound exists, making large-holder analysis inherently imprecise.

  • chamber — House or Senate. Critical for joining against ProPublica committee data, which is chambered.

The 45-day lag and non-compliance

The 45-day disclosure window is the central weakness of the STOCK Act as a tool for real-time public accountability. A member who purchases stock the day before a significant legislative vote has 45 days to disclose that trade. By the time the disclosure appears in the House Clerk's system, the relevant legislative event is weeks in the past, the market has already moved, and any abnormal return generated by information advantage has been fully realized.

In practice, the actual disclosure lag is substantially longer than 45 days for a significant fraction of filings. Studies of STOCK Act compliance have found median filing delays of 25–35 days for on-time filers, but a right tail of violations extending well beyond 200 days. Multiple members have filed PTRs covering transactions from a year or more prior. Under the Act, the penalty for any violation—whether a single day late or 400 days late—is a flat $200 fine. There is no scaling with the transaction size, no interest charge, no escalation for repeat violations.

The Fine Report, maintained by the Office of Congressional Ethics and accessible via House Clerk records, tracks which members have paid STOCK Act fines. Between 2021 and 2024, more than 70 House members and at least 20 senators were found to have filed late disclosures. Several members paid the $200 fine multiple times within a single Congress—an outcome that, given the scale of the trades being disclosed, suggests the penalty functions as a cost of doing business rather than a deterrent.

No watchdog organization currently publishes a systematic real-time tracker of STOCK Act violations. The closest proxy is the Capitol Trades platform, which surfaces the filing date alongside the transaction date, allowing users to compute the lag manually.

Committee assignments and the conflict-of-interest pattern

The analytical core of congressional trading research is the committee assignment join. Members of Congress sit on committees with jurisdiction over specific sectors of the economy. The Senate Intelligence Committee receives classified briefings on national security threats. The House Energy and Commerce Committee oversees energy markets, telecommunications, and pharmaceutical regulation. The Senate Banking Committee shapes financial regulation. The Armed Services committees in both chambers authorize and appropriate defense spending. Each of these assignments gives members access to non-public information with direct implications for asset valuations.

ProPublica's Congress API exposes committee assignments by member and session:

# Committee memberships for the 118th Congress, Senate
curl "https://api.propublica.org/congress/v1/118/senate/members.json" \
  -H "X-API-Key: YOUR_KEY"

# Members of a specific committee
curl "https://api.propublica.org/congress/v1/118/both/committees/SSCI/members.json" \
  -H "X-API-Key: YOUR_KEY"

The Senate Select Committee on Intelligence has the SSCI designation. The corresponding House committee is HPSCI (House Permanent Select Committee on Intelligence). Joining the Quiver or Capitol Trades transaction dataset against SSCI/HPSCI membership produces a filter that isolates all trades by Intelligence Committee members. The February 2020 COVID briefing trades became public because journalists performed this join manually—matching the timeline of Senate Intelligence Committee briefings against PTR filings by SSCI members.

The semiconductor legislation pattern from 2022 is a second prominent example. The CHIPS and Science Act (Pub. L. 117-167), which appropriated $52.7 billion in subsidies for domestic semiconductor manufacturing, was in active legislative development throughout late 2021 and 2022. Disclosures filed by members of the relevant committees—Senate Commerce and House Science, Space, and Technology—show purchases of semiconductor company equities during this period. The tickers involved (NVDA, INTC, AMD, QCOM, TSM) all rose significantly around and after the bill's passage in August 2022. Defense spending authorization cycles produce a similar pattern: Armed Services Committee members with positions in Raytheon, Lockheed Martin, Northrop Grumman, and L3Harris are a persistent feature of the dataset year over year.

Python approach: event study on STOCK Act data

The standard method for testing whether congressional trades outperform the market is an event study using a CAPM or Fama–French benchmark. The procedure is:

  1. Download the Capitol Trades CSV export for a given date range and filter to purchases only. Normalize tickers against a reference list (OpenFIGI or the CRSP/Compustat security master for academic work).
  2. For each transaction, define an event window: typically [−5, +30] days around the transaction_date. The pre-event window captures whether the member was already holding before the trade date (suggesting a longer-running position rather than an opportunistic entry).
  3. Estimate expected returns using a market model over a pre-event estimation window (e.g., [−252, −30] trading days). The benchmark is typically the S&P 500 total return index or the three Fama–French factors.
  4. Compute cumulative abnormal returns (CAR) over the event window: the difference between realized returns and the predicted returns from the market model.
  5. Aggregate CARs across all events, stratified by committee membership, transaction type (purchase vs. sale), and amount range. Test for statistical significance using a cross-sectional t-test or the Boehmer, Musumeci, Poulsen (BMP) test for event studies with clustered event dates.
import pandas as pd
import numpy as np
import yfinance as yf

# Load Capitol Trades CSV export
trades = pd.read_csv("capitol_trades_export.csv", parse_dates=["transaction_date"])

# Filter to purchases of US-listed equities
purchases = trades[
    (trades["asset_type"] == "stock") &
    (trades["transaction_type"] == "purchase") &
    (trades["ticker"].notna())
].copy()

# Retrieve market benchmark (SPY) for estimation period
spy = yf.download("SPY", start="2018-01-01", end="2025-12-31")["Close"]
spy_ret = spy.pct_change()

results = []
for _, row in purchases.iterrows():
    t0 = row["transaction_date"]
    ticker = row["ticker"]
    try:
        px = yf.download(ticker, start=t0 - pd.Timedelta(days=300),
                         end=t0 + pd.Timedelta(days=45), progress=False)["Close"]
        if len(px) < 100:
            continue
        stock_ret = px.pct_change().dropna()
        # Estimation window: -252 to -30 trading days before event
        est = stock_ret.iloc[:-40][-222:]
        bench = spy_ret.reindex(est.index).dropna()
        est = est.reindex(bench.index)
        # OLS market model
        cov = np.cov(est, bench)
        beta = cov[0, 1] / cov[1, 1]
        alpha = est.mean() - beta * bench.mean()
        # Event window: t0 to t0+30
        post = stock_ret[stock_ret.index >= t0].iloc[:30]
        bench_post = spy_ret.reindex(post.index)
        ar = post - (alpha + beta * bench_post)
        car = ar.sum()
        results.append({
            "member": row["legislator_id"],
            "ticker": ticker,
            "transaction_date": t0,
            "car_30d": car,
            "amount_range": row["amount_range"],
        })
    except Exception:
        continue

df = pd.DataFrame(results)
print(df.groupby("amount_range")["car_30d"].describe())

Academic studies using this method have found statistically significant positive CARs for congressional stock purchases, particularly in the $100,000–$250,000 and higher amount ranges. A 2004 study by Ziobrowski et al. in the Journal of Financial and Quantitative Analysis found Senate portfolios outperformed the market by 12 percentage points annually. A 2011 follow-up for the House found 6 percentage points of annual outperformance. Post-STOCK Act studies have found attenuated but still positive abnormal returns, suggesting disclosure—without prohibition—has not eliminated the underlying information advantage.

The family exemption gap

The STOCK Act requires disclosure of transactions by “dependent children” of covered officials, and it covers transactions made by or for the account of the filer's spouse. But the practical enforceability of the spousal disclosure is limited. A transaction in a spouse's brokerage account is required to be reported, but the Form PTR provides no mechanism to verify whether all spousal accounts are being disclosed. Unlike SEC Rule 10b-5 enforcement against corporate insiders—which can subpoena brokerage records—STOCK Act compliance is self-reported. The House Ethics Committee can investigate, but investigations are rare and have not historically produced prosecutions for disclosure omissions.

The blind trust exemption is a related gap. Members who place their assets in a qualified blind trust approved by the House Ethics Committee are exempt from transaction-level reporting. The theory is that a truly blind trust removes the member's knowledge of specific holdings. In practice, the approval process for blind trusts is slow, and members with large equity portfolios frequently maintain non-blind brokerage accounts alongside retirement accounts for years before transitioning. Blind trusts are also not required; they are optional. A member choosing not to establish a blind trust is choosing to remain personally aware of their holdings and trading activity.

The result is a disclosure regime that captures a meaningful but structurally incomplete picture of congressional financial activity. The transactions that do appear are rich with analytical signal. The transactions that disappear into spousal accounts, undisclosed outside accounts, or blind-trust structures are systematically absent.

Building a pipeline

A production pipeline for tracking STOCK Act data at scale has three components:

Ingestion. The House Clerk's XML index athttps://disclosures.house.gov/FinancialDisclosure#Search lists all PTRs filed in a given year. The index file for each year (e.g., https://disclosures.house.gov/public_disc/financial-pdfs/2024FD.zip) is a ZIP archive containing an XML manifest of all filings. Each entry in the manifest includes the member's name, the filing date, and a URL to the PDF. Automating download of new PDFs requires polling this manifest daily and detecting new entries. The Senate eFDS requires scraping the search interface or using the Capitol Trades or Quiver API as a proxy.

Extraction. House PDFs require OCR. The open-source Tesseract engine (via pytesseract) produces usable output for clean scans; the Python pdfplumber library handles digital PDFs (those generated by software rather than a scanner) more reliably. The target table structure per page is: asset name, transaction type, transaction date, notification date, amount range. The notification date is when the member became aware of the transaction—relevant for options exercises and trust transactions where the execution date may differ from knowledge date.

Enrichment and storage. Extracted records need ticker normalization (OpenFIGI's free API maps ISIN, CUSIP, and security names to tickers), committee assignment joins (ProPublica Congress API by bioguide ID and session), and deduplication (the same PTR can appear in both House and Senate if a member served in both chambers in the same year, which is uncommon but possible during mid-term transitions). Cloudflare D1 or a local SQLite instance handles the storage requirements comfortably; the full STOCK Act dataset from 2012 to present is under 2GB in structured form, well within D1's limits.

What the data does not show

The STOCK Act dataset reveals what members reported. It does not reveal what they traded in accounts they chose not to disclose (whether by omission or by routing through a spouse). It does not reveal what they chose not to trade based on information received in a classified briefing. Strategic inaction—selling a holding before legislation is introduced, or holding cash through a period when a member had reason to expect a market decline—leaves no trace in a transaction-based disclosure system.

The 45-day lag also means the dataset is a historical record, not a real-time feed. By the time a trade appears in the system, the information that may have motivated it is a month and a half stale at minimum. The most market-sensitive events—classified intelligence briefings, closed committee hearings on regulatory action, private briefings from agency heads before public announcements—leave, at most, a signal that can only be detected in aggregate, across many members and many events, using the statistical methods described above.

Despite these limitations, the STOCK Act dataset remains one of the richer public windows into the intersection of political power and financial markets. The structure of the disclosure—transaction-level, dated, nominally complete for all members in all securities above $1,000—is more granular than most comparable disclosure regimes in comparable democracies. The weakness is not in what is disclosed but in what is never disclosed, never prohibited, and never prosecuted.

Related writing

For the SEC institutional holdings dataset that cross-references against congressional trading records—who holds what at scale, and how 13F positions change around the same legislative events: Who owns what: indexing SEC Form 13F institutional holdings data →

For the FARA foreign agent registration dataset and how to map Washington influence networks across lobbying firms, foreign governments, and congressional relationships: Foreign agents in plain sight: mapping DC's hidden influence network with FARA data →