Once a quarter, the largest pools of professionally managed money in the country— Berkshire Hathaway, Bridgewater, the great pension funds, and thousands of hedge funds— have to file a single form with the SEC that lists the stocks they hold. Form 13F is that filing, and it is the federal database behind every “what the smart money bought” headline you have ever read. Surfaced from EDGAR as roughly 718,000 reported holding rows, one row per manager per security per quarter, it is the closest thing the public has to a standing window into what the big money owns—and, just as importantly, a window that is long-only, lagged by 45 days, and partial by design.
This article covers what the 13F dataset is and the grain at which it is stored; the statutory frame—Section 13(f) of the Securities Exchange Act, the $100 million threshold, and the 45-day quarterly filing deadline; exactly which securities count as 13(f) securities and which assets the form silently omits; the structured XML information table that has made the data reliably machine-readable since the 2013 mandate; the difference between a 13F-HR holdings report and a 13F-NT notice, and how confidential treatment lets managers delay disclosing positions; the practice of whale-watching and the ways it is routinely over-read; how the holdings table joins to the rest of the securities-data universe through the CUSIP and the manager CIK; a Python workflow that pulls a manager's latest 13F-HR information table from EDGAR and aggregates its top holdings by value; and the caveats—long-only bias, staleness, window dressing, and identifier drift—that every analyst must internalize before drawing a conclusion from a 13F.
What the dataset is
Form 13F is a quarterly report that certain large institutional investment managers must file with the SEC, disclosing the equity-style securities over which they exercise investment discretion. The report is filed through EDGAR, the SEC's Electronic Data Gathering, Analysis, and Retrieval system, as a 13F-HR filing (“HR” for holdings report). Inside each 13F-HR is a structured table—the information table—that lists every reportable position the manager held as of the last day of the quarter: the issuer, the security's CUSIP, its market value, and the number of shares or the principal amount. Pulled together across all filers and quarters, those information tables are the source of the holdings dataset, which comprises roughly 718,000 reported holding rows.
In our database this record is stored as the table sec_13f_holdings, with the grain of one row per reported holding—that is, one row per manager × security × quarter. A single manager that holds two hundred positions in a given quarter contributes two hundred rows for that period; the next quarter it contributes a fresh set. The columns capture who filed, for which quarter, what they held, and how much:
manager_name -- the filing institutional investment manager
cik -- the manager's Central Index Key (filer join key)
accession_number -- the EDGAR accession that contained this row
period_of_report -- quarter-end the holdings are reported as of
name_of_issuer -- issuer name as the manager reported it
title_of_class -- security class (e.g. COM, CL A, NOTE)
cusip -- 9-character CUSIP (the security join key)
value -- market value of the position (USD)
ssh_prnamt -- share count OR principal amount
ssh_prnamt_type -- SH (shares) or PRN (principal)
investment_discretion -- SOLE, DEFINED (shared), or OTHER
put_call -- PUT or CALL for an option position, else null
voting_authority -- sole / shared / none voting share countsTwo columns are load-bearing. The cusip is the security join key: the nine-character CUSIP uniquely identifies the issue, and it is the key that ties a 13F position to the same security in every other dataset—a company's filings, its other holders, its insider transactions, its price history. The cik is the filer join key: the Central Index Key is the persistent identifier EDGAR assigns to every filer, and it is what ties all of a manager's 13F filings together across quarters and links them to anything else the manager files. The ssh_prnamt and ssh_prnamt_type columns must be read together—the amount is a share count when the type is SH and a principal amount when the type is PRN—and the put_call field, when populated, flags that the row is an option position rather than a direct holding, which is essential because an option line is not the same economic exposure as owning the underlying shares.
What it is and the Section 13(f) regulatory frame
The reporting obligation comes from Section 13(f) of the Securities Exchange Act of 1934, added by Congress in the 1975 amendments to the Act. The purpose was disclosure and market transparency: as institutional money came to dominate the equity markets, Congress wanted a public, standardized record of the holdings of the largest institutional managers, so that issuers, regulators, and the public could see the concentration of institutional ownership and study its effect on the markets. The SEC implemented the mandate through Rule 13f-1 and Form 13F, and the form has been the standing quarterly disclosure of large institutional equity holdings ever since.
The obligation attaches to an institutional investment manager—a broad category that includes any entity that invests in, or buys and sells, securities for its own account, and any natural person or entity that exercises investment discretion over the account of another person. In practice the filers are investment advisers, banks, insurance companies, broker-dealers, pension funds, and corporations that manage their own securities portfolios. The trigger is a size threshold: a manager must file if it exercises investment discretion over at least $100 million in 13(f) securities. The $100 million is measured on the last trading day of any month in a calendar year; once a manager crosses it, it becomes subject to the reporting requirement and must file for the fourth quarter of that year and for all four quarters of the following year. This is why the population of filers expands and contracts with the markets—a rising market pushes more managers over the threshold—and why a manager can drop out of the dataset without having changed its strategy at all.
The cadence is the single most important structural fact about the data. A 13F is a quarterly report, filed within 45 days after the end of each calendar quarter. A manager's holdings as of the close of business on March 31 are reported in a 13F-HR that need not be filed until mid-May; the December 31 snapshot need not appear until mid-February. That 45-day window is built into the rule, and it is the origin of the data's defining limitation: a 13F is always a photograph of a quarter-end that is already six weeks to four-and-a-half months in the past by the time the public sees it. The positions it shows may have been sold the day after the snapshot. The period_of_report column records the quarter the holdings are as of; the filing date records when the world found out—and the gap between them is the lag every analyst must respect.
The investment-discretion concept deserves a word, because it defines what gets reported and by whom. The obligation runs to the manager that exercises investment discretion—the authority to decide what to buy and sell—over the securities, not necessarily to the beneficial owner. A pension fund that hires three external sub-advisers and gives each discretion over a slice of the portfolio creates a situation where the sub-advisers, not the pension fund, may be the 13F filers for the slices they manage. The investment_discretion column on each row records whether the filer's discretion over that position is sole, shared (“defined”), or other—a detail that matters when reconciling overlapping reports, because the same shares can legitimately appear on more than one manager's 13F under different discretion codes without being double-counted in the market.
Which securities count, and which assets the form omits
Form 13F does not cover everything a manager owns. It covers only 13(f) securities—a specific, enumerated universe—and the boundary of that universe is what makes a 13F a partial portrait even of the long side of a portfolio. The SEC publishes an official list of 13(f) securities every quarter, and a security must appear on that list to be reportable. The list is, in essence, the universe of exchange-traded equity-style instruments.
What is on the list: exchange-listed stocks (common and many preferred shares), equity options on those securities, convertible debt securities, certain warrants, and the great majority of exchange-traded funds (ETFs). The ETF coverage is significant: because so many managers express market and sector views through ETFs, a large share of reported 13F value sits in a relatively small number of broad-market and sector ETFs, and any analysis of “what institutions own” has to decide how to treat those fund wrappers, which are themselves baskets of the underlying stocks. Each position on the form carries the security's issuer name, its CUSIP, its market value as of the quarter-end, and the share count or, for convertible debt, the principal amount.
What is not on the form is just as important, and it is the heart of why a 13F is never a complete portfolio. The form reports long positions only. Short positions—bets that a security will fall—are entirely excluded; a hedge fund that is net short a stock can appear in 13F data as a holder of the very stock it is betting against, because the long leg of the trade is reported and the short leg is invisible. Beyond shorts, the form omits cash and cash equivalents, most fixed-income holdings (ordinary bonds that are not 13(f) securities), commodities and futures, foreign securities that are not US-listed, private holdings, and short-side derivatives. A manager whose strategy is dominated by shorting, by currencies, by physical commodities, or by foreign markets can have a 13F that captures only a sliver of what it actually does. The single most common analytical mistake with this dataset is to treat a manager's reported long positions as if they were the manager's portfolio. They are one leg of it, and often not the most informative one.
The structured information table and the 2013 mandate
For much of its history Form 13F was a machine-readability nightmare. The holdings were filed as free-form text and, later, as inconsistently formatted documents, so extracting a clean table of positions meant parsing prose, wrestling with idiosyncratic layouts, and tolerating frequent errors. The data existed, but turning it into an analyzable dataset was laborious and unreliable.
That changed when the SEC mandated that the holdings be filed as a structured XML information table. Since the 2013 effective date of that requirement, every 13F-HR carries its positions in a standardized XML schema: each holding is an infoTable element with child elements for the issuer name, the title of class, the CUSIP, the value, a shrsOrPrnAmt block giving the share or principal amount and its type, the put/call indicator, the investment-discretion code, and the voting-authority breakdown. The standardization is what made the modern 13F dataset possible: a parser can read the same fields out of every filer's table, the CUSIP can be relied on as a clean join key, and values and share counts come through as numbers rather than as text to be salvaged. The sec_13f_holdings table is, in effect, the accumulation of those XML information tables, normalized into one row per holding. One consequence worth flagging: the SEC's later amendments changed the reporting convention for the value field—older filings reported value in thousands of dollars, while more recent filings report value in whole dollars—so any longitudinal analysis has to detect the convention by period and scale accordingly, or it will misstate market values by a factor of a thousand across the transition.
13F-HR, 13F-NT, and confidential treatment
Not every 13F filing contains a holdings table, and not every holding a manager owns appears in the one it files. Two mechanisms—the 13F-NT notice and confidential treatment—mean that the visible dataset is, by design, less than the full set of reportable positions, and an analyst who is not aware of them will silently undercount.
The first is the distinction between a 13F-HR holdings report and a 13F-NT notice. When the same securities are reportable by more than one related manager—say, a parent and several affiliated advisers—the rules avoid double-counting by letting one manager file the full holdings report (the 13F-HR, which carries the information table) while the others file a 13F-NT notice that simply points to the manager doing the reporting and carries no holdings of its own. A 13F-NT is therefore a real, valid filing that contains zero rows for the holdings dataset. Any pipeline that pulls “13F filings” indiscriminately and expects an information table in each will choke on the notices; the holdings live only in the 13F-HR (and in 13F-HR/A amendments). This is also why following a manager's positions sometimes requires following a different CIK—the affiliate that actually holds the table—than the one whose name is on the strategy.
The second, and more consequential, mechanism is confidential treatment. The SEC permits a manager to request that the disclosure of certain positions be delayed—kept confidential— where the manager can justify the request, typically on the ground that immediate disclosure would reveal an ongoing acquisition program and let others trade ahead of it. When a request is granted, those positions are omitted from the publicly filed information table for a period and disclosed only later, often through an amended filing. The practical effect is that a 13F can be both stale and incomplete at the same instant it is filed: the positions it omits under confidential treatment are exactly the ones the manager is most actively building, which are frequently the most interesting. Berkshire Hathaway's confidential accumulation of certain large stakes, later revealed when the confidential treatment lapsed, is the canonical example—the very fact that a position was hidden was itself a signal. For the dataset this means a clean, fully reconciled view of a manager's reportable longs sometimes only exists after the amendments land, and the original filing should be read as a floor on the holdings, not a complete enumeration.
Whale-watching and how it is over-read
The popular use of 13F data is whale-watching: tracking what famous managers bought and sold from one quarter to the next, and treating those moves as signals to follow. When the mid-February and mid-May filing deadlines pass, financial media fills with stories about what Berkshire Hathaway, Bridgewater, and the marquee hedge funds added, trimmed, or exited. The appeal is obvious—these are sophisticated investors with research budgets most people cannot match, and the 13F is a rare, free, standardized look over their shoulder. Computing quarter-over-quarter changes from the dataset is straightforward: because each row carries a period and a CUSIP, the same manager's holdings in two consecutive quarters can be differenced to produce a clean list of new positions, closed positions, increases, and decreases.
The problem is that whale-watching is over-read more often than it is read correctly, and every one of its failure modes traces back to a limitation already described. The staleness means the “move” you are following happened up to four-and-a-half months ago, and the manager may have reversed it entirely before you ever saw it. The long-only biasmeans an apparent bullish position may be the long leg of a hedge, a merger-arbitrage trade, or a convertible-bond arbitrage in which the manager is simultaneously short the same name in a way the 13F cannot show—so “Fund X bought Stock Y” can be precisely backwards as a directional read. The discretion questionmeans a position may reflect a client's mandate rather than the manager's conviction. And because confidential treatment hides the positions a manager is most actively building, the most important moves are sometimes the ones that are not in the filing at all. Used carefully—as a quarterly census of disclosed institutional long exposure, with its lag and its omissions held firmly in mind—13F data is genuinely valuable. Used as a real-time trade-copying signal, it is a trap.
Joining to the securities-data universe
The 13F holdings table is most powerful not in isolation but as one node in the integrated securities-data graph, and it has two clean join keys that connect it outward: the CUSIP on the security side and the manager CIK on the filer side. Three joins matter most.
The first is the CUSIP join to the issuer and its other holders. Because every 13F position carries the security's CUSIP, the holdings can be pivoted from the manager's point of view (what does this manager own?) to the issuer's point of view (who owns this issuer?). Aggregating all 13F rows for a given CUSIP across managers reconstructs the disclosed institutional ownership of a single security—how concentrated it is, how it is shifting quarter to quarter, and which managers are entering or exiting. The CUSIP is also the bridge to the rest of the company's data: with a CUSIP-to-issuer crosswalk it joins to the issuer's own SEC filings and to its ticker and price history, so a 13F position can be valued, compared to the float, and tracked over time. The CUSIP is the single most important field for any issuer-centric or cross-manager analysis.
The second is the CIK join across a manager's filings. The manager's Central Index Key ties together all of its 13F-HR filings, quarter after quarter, which is what makes the quarter-over-quarter change analysis possible in the first place. The same CIK also links the manager to anything else it files with the SEC—a Schedule 13D or 13G when a stake crosses the beneficial-ownership thresholds, a Form 3/4/5 when the manager is an insider, an N-PORT when the manager is also a registered fund. The CIK turns the 13F from a standalone quarterly snapshot into one strand of a manager's complete federal disclosure history.
The third, broader join is to the other ownership-disclosure regimes that surround the 13F. The 13F shows quarterly long positions above a manager-level size threshold; the Schedule 13D/13G regime shows holdings that cross five percent of a single issuer's class, on a much faster timeline; the N-PORT regime shows the full, monthly-resolution portfolios of registered funds, including the short and fixed-income positions a 13F omits. Reconciling a manager's 13F against its 13D/13G filings and, where it runs registered funds, against N-PORT is how an analyst assembles a fuller picture than any single regime provides—and how the 13F's blind spots get filled in by the regimes designed around different questions.
Analytical uses
A quarterly, manager-resolved, security-resolved census of disclosed institutional long holdings supports a distinctive set of analyses that no single filing can.
Quarter-over-quarter position changes by manager is the most immediate use, and the one whale-watching rests on. Differencing a manager's holdings across two consecutive period_of_report values, joined on CUSIP, yields a clean ledger of new buys, complete exits, and adds and trims—the raw material for tracking a manager's evolving exposure, provided the lag and the long-only caveats are carried alongside every number.
Institutional ownership and concentration by securityinverts the view: aggregating all managers' positions in one CUSIP measures how much of a security's disclosed float sits with reporting institutions, how concentrated that ownership is, and how crowded a name has become—a crowdedness that itself carries risk, because heavily co-owned positions can unwind together. Rolling this up by sector or by manager type surfaces where institutional money is flowing and where it is retreating.
Manager-similarity and crowding analysis uses the overlap between managers' holdings to map which funds hold similar books—a measure of strategy clustering and of how exposed a group of managers is to the same shocks. And portfolio replication and reverse-engineeringattempts to infer a manager's strategy from the disclosed longs—valuable, but the use most vulnerable to the form's omissions, since a replication built on longs alone will diverge sharply from any manager whose returns come substantially from the short, fixed-income, or derivative exposures the 13F cannot see. Across all of these, the discipline is the same: the 13F measures disclosed institutional long exposure at quarter-ends, seen on a delay, and every metric inherits those qualifiers.
Python workflow: pulling a manager's latest 13F from EDGAR
The script below pulls a manager's most recent 13F-HR information table from SEC EDGAR and aggregates its top holdings by value. It uses the EDGAR submissions API to find the latest 13F-HR for a given CIK, lists the filing's index to locate the XML information table (whose file name is not fixed), parses the infoTable elements into one row per position, collapses multiple lines for the same CUSIP into a single position, and prints the reported long market value, the number of distinct securities, and the top holdings with each position's share of the portfolio. No API key is required, but the SEC requires a descriptive User-Agent header identifying the requester, and it rate-limits clients that omit it. The example uses Berkshire Hathaway's investment manager, which files under CIK 1067983.
import requests, io
import xml.etree.ElementTree as ET
import pandas as pd
# SEC EDGAR -- no API key required for public filings. The SEC asks that
# every request set a descriptive User-Agent that identifies you and gives
# a contact address; requests without one are throttled or rejected.
HEADERS = {"User-Agent": "AI Analytics research contact@example.com"}
SUBMISSIONS = "https://data.sec.gov/submissions/CIK{cik}.json"
def latest_13f_hr(cik):
# The submissions API returns a manager's recent filing history. Find
# the most recent 13F-HR (the holdings report; 13F-NT is a notice that
# the holdings are reported by another manager and carries no table).
cik10 = str(cik).zfill(10)
meta = requests.get(SUBMISSIONS.format(cik=cik10), headers=HEADERS,
timeout=60).json()
recent = meta["filings"]["recent"]
for form, acc, doc, period in zip(recent["form"],
recent["accessionNumber"],
recent["primaryDocument"],
recent["reportDate"]):
if form == "13F-HR":
acc_nodashes = acc.replace("-", "")
base = (f"https://www.sec.gov/Archives/edgar/data/"
f"{int(cik)}/{acc_nodashes}/")
return {"accession": acc, "period": period, "base": base}
raise ValueError("No 13F-HR found for this CIK")
def information_table(base):
# The structured holdings live in the XML INFORMATION TABLE. Its file
# name is not fixed, so list the filing index and pick the .xml that
# contains <infoTable> elements rather than the cover-page primary doc.
idx = requests.get(base + "index.json", headers=HEADERS, timeout=60).json()
xml_files = [f["name"] for f in idx["directory"]["item"]
if f["name"].lower().endswith(".xml")]
for name in xml_files:
raw = requests.get(base + name, headers=HEADERS, timeout=60).content
if b"infoTable" in raw:
return raw
raise ValueError("No information table XML in this filing")
def parse_holdings(xml_bytes):
# Strip namespaces so the element tags are easy to address, then read
# one row per <infoTable>: issuer, CUSIP, value, and share/principal.
text = xml_bytes.decode("utf-8", errors="replace")
text = text.replace("ns1:", "").replace("n1:", "")
root = ET.fromstring(text)
rows = []
for it in root.iter("infoTable"):
def g(tag):
el = it.find(tag)
return el.text.strip() if el is not None and el.text else None
shrs = it.find("shrsOrPrnAmt")
rows.append({
"issuer": g("nameOfIssuer"),
"class_title": g("titleOfClass"),
"cusip": g("cusip"),
# 13F values are reported in dollars (post-2022 amendment;
# older filings reported in thousands -- check the period).
"value": float(g("value") or 0),
"amount": float(shrs.findtext("sshPrnamt") or 0) if shrs is not None else 0,
"amount_type": shrs.findtext("sshPrnamtType") if shrs is not None else None,
"put_call": g("putCall"),
})
return pd.DataFrame(rows)
def top_holdings(cik, n=15):
f = latest_13f_hr(cik)
df = parse_holdings(information_table(f["base"]))
# Collapse multiple lines for the same security (different lots, or
# shares plus options) into one position per CUSIP, summing value.
by_cusip = (df.groupby(["cusip", "issuer"], dropna=False)["value"]
.sum().reset_index().sort_values("value", ascending=False))
total = by_cusip["value"].sum()
print(f"Period of report: {f['period']} accession: {f['accession']}")
print(f"Reported long market value: ${total:,.0f}")
print(f"Distinct securities (by CUSIP): {by_cusip['cusip'].nunique():,}\n")
print("Top holdings by reported value:")
for _, r in by_cusip.head(n).iterrows():
share = r["value"] / total if total else 0
print(f" {str(r['issuer'])[:34]:<34} {r['cusip']:<9} "
f"${r['value']:>16,.0f} {share:6.2%}")
return by_cusip
# Berkshire Hathaway's investment manager files under CIK 1067983.
top_holdings(1067983)
Two practical notes apply. First, the value-units convention is a genuine trap: filings from before the SEC's amendment report the value field in thousands of dollars, while more recent filings report it in whole dollars, so a script that aggregates value across the transition without detecting the convention by period will overstate or understate market values by a factor of a thousand. A production pipeline should branch on the period_of_report (or inspect the schema version) and scale accordingly before summing. Second, the script reads a single 13F-HR; the quarter-over-quarter change analysis that drives most real use requires pulling two consecutive periods and differencing on CUSIP—and a complete view of a manager that uses 13F-NT notices or confidential treatment may require following an affiliate's CIK and waiting for amended filings, neither of which the single-filing pull captures. For national-scale work—reconstructing institutional ownership across every issuer, or building the full cross-manager crowding analysis—the bulk 13F datasets the SEC publishes are far more efficient than walking EDGAR one CIK at a time, and they ship with the period and unit metadata needed to handle the value-scaling correctly.
Limitations and analytical caveats
Form 13F is the most comprehensive public record of large institutional equity holdings in the United States, but it carries structural limitations so consequential that misreading them is the rule rather than the exception. An analyst must internalize all of them before drawing a conclusion.
It is long-only, so it is never a complete portfolio.The form reports only long positions in 13(f) securities. Shorts, cash, most bonds, commodities, currencies, foreign-listed securities, and short-side derivatives are all invisible. A reported long can be the benign leg of a hedge, an arbitrage, or a market-neutral book, which means a directional reading of any single position can be exactly wrong. Treating a manager's 13F as its portfolio—rather than as the disclosed long slice of it—is the foundational error, and almost every over-read of the data descends from it.
It is stale by 45 days, and often much more. The holdings are a quarter-end snapshot disclosed up to 45 days later, so the public always sees the data through a delay of six weeks to four-and-a-half months, during which the positions may have been reversed. Confidential treatment stretches the delay further for exactly the positions a manager is most actively building. The 13F is authoritative for what a manager held at a past quarter-end; it is not, and was never designed to be, a real-time view of what a manager holds now.
Window dressing and discretionary reporting distort the snapshot. Because the report is a quarter-end photograph, it is vulnerable to window dressing—the practice of adjusting a portfolio near the reporting date to present a more flattering picture than the manager held during the quarter. The snapshot also says nothing about intra-quarter trading: a manager that bought and fully sold a large position between quarter-ends leaves no trace at all. And the discretion codes mean a position may reflect a client mandate rather than the manager's own view. The data faithfully records the quarter-end state; it does not record the manager's conviction, its timing, or what happened in between.
CUSIP and identifier drift complicate joins, and the value-units convention changed. Issuer names are reported as free text and vary across filers and quarters, so the CUSIP, not the name, must carry every security join; but CUSIPs themselves can change with corporate actions, reorganizations, and reclassifications, so a longitudinal join on CUSIP must account for the crosswalk of changes. The value field switched from thousands to whole dollars across the SEC's amendment, so any analysis spanning that boundary must scale by period or report nonsense. And filing errors— transposed CUSIPs, mis-stated share counts, omitted positions—are not rare; the data should be sanity-checked against issuer floats and prices, not taken on faith.
Held with these caveats in mind, the sec_13f_holdings table is a uniquely valuable resource: a quarterly, manager-resolved, security-resolved, machine-readable census of the disclosed long holdings of the largest institutional managers in the country—the standing federal record of what the big money owns, as long as you remember that it shows you only the long side, only at quarter-ends, and only after a deliberate delay.
Related writing
SEC N-PORT Mutual Fund Holdings: The Federal Database Behind Every Fund Portfolio Position — N-PORT is the regime that fills the 13F's blind spots for registered funds, disclosing the full monthly-resolution portfolio—including the short and fixed-income positions a long-only 13F cannot show—so reconciling a manager's 13F against its N-PORT filings assembles a far more complete view than either provides alone.
SEC Schedule 13D Filings: The Federal Database Behind Activist Investor Stakes — Where the 13F is a slow quarterly census of every large manager's longs, Schedule 13D is the fast, issuer-level disclosure triggered when a holder crosses five percent of a class with intent to influence, and joining the two on the manager's CIK shows the same investor's broad book and its activist stakes side by side.
SEC EDGAR Company Registry: The Federal Index That Resolves Every Public Company — The 13F's CUSIP and CIK join keys only become powerful when they resolve to real issuers and filers, and the EDGAR company registry is the index that turns a CUSIP into a named issuer and a CIK into a known manager, anchoring every cross-dataset join the holdings table supports.