Technical writing

SEC EDGAR XBRL Financials: Machine-Readable Fundamentals for Every Public Company

· 14 min read· AI Analytics
Federal DataSECEDGARFinancial Data

Every public company in the United States files its quarterly and annual financial statements with the SEC. Since 2009—for large accelerated filers—and 2011 for all filers, those statements have been required to be tagged in XBRL: eXtensible Business Reporting Language, a structured XML dialect that maps each reported number to a standardized concept name from the US-GAAP taxonomy. The SEC extracts those tags into two complementary data products: a set of quarterly bulk downloads called the Financial Statements and Notes dataset, and a per-company JSON API called EDGAR Company Facts. The result is a machine-readable archive of the income statement, balance sheet, cash flow statement, and statement of stockholders' equity for every US public company, covering every 10-K and 10-Q filed since the mandate took effect.

This article covers what the EDGAR XBRL financial dataset contains, the US-GAAP taxonomy structure, how to access data via the Company Facts API, the Submissions API for filing history, and the frames endpoint for cross-sectional screening. It also covers the bulk FSN dataset, the most significant data quality issues—XBRL extension elements, restated periods, and unit inconsistencies—and what practitioners in factor investing, financial journalism, and academic research have built on top of this data. A Python implementation demonstrates fetching cross-sectional quarterly revenue data and computing year-over-year growth rates across all reporting companies.

The XBRL mandate and what it produces

The SEC's XBRL mandate required filers to accompany their traditional HTML financial statements with an Interactive Data submission: a structured XBRL document tagging each reported number to a concept name from the US-GAAP taxonomy. Large accelerated filers (companies with public float above $700 million) were required beginning with fiscal years ending on or after June 15, 2009. Accelerated filers followed in 2010, and all remaining filers in 2011. Since 2019, filers have been required to submit XBRL as Inline XBRL (iXBRL), embedding the tags directly in the HTML filing rather than as a separate document.

The EDGAR universe covers roughly 7,000 active operating public company filers. The broader total of current filers—including exchange-traded funds, closed-end funds, business development companies, and other registered investment companies—is approximately 13,000. The XBRL mandate applies to operating companies filing 10-K and 10-Q forms; registered investment companies use a different reporting taxonomy (US-GAAP investment company elements) and are covered by separate SEC data products.

Each quarter, the SEC processes all XBRL submissions received during that period and publishes extracted data in two formats. The Financial Statements and Notes (FSN) dataset is a set of quarterly bulk zip files, available at www.sec.gov/dera/data/financial-statements, that can be downloaded and processed offline. The EDGAR data APIs—Company Facts and the frames endpoint— provide programmatic access to the same underlying data with a JSON interface, without requiring bulk download.

The US-GAAP taxonomy: concept names as a lingua franca

The central organizing principle of XBRL financial data is the concept name: a standardized label from the US-GAAP taxonomy that identifies what a reported number represents. The taxonomy is maintained by the Financial Accounting Standards Board in coordination with the SEC. Concept names follow a namespace:LocalName convention. The two principal namespaces in EDGAR data are:

  • us-gaap — the core US Generally Accepted Accounting Principles taxonomy, covering income statement, balance sheet, cash flow, and equity statement concepts. Examples: us-gaap:Revenues, us-gaap:NetIncomeLoss, us-gaap:Assets, us-gaap:LongTermDebt, us-gaap:CashAndCashEquivalentsAtCarryingValue, us-gaap:GoodwillAndIntangibleAssetsDisclosureAbstract. The taxonomy contains several thousand elements covering every aspect of US GAAP financial reporting.
  • dei — the Document and Entity Information namespace, covering entity-level metadata rather than financial statement line items. Key elements: dei:EntityCommonStockSharesOutstanding, dei:EntityPublicFloat, dei:EntityFilerCategory, dei:TradingSymbol, dei:DocumentPeriodEndDate. DEI elements are reported once per filing and provide the company metadata needed to interpret the financial data.

Each XBRL fact has three dimensions beyond its concept name: the reporting entity (identified by CIK), the time period (a start date and end date for duration concepts like revenues, or a single date for instant concepts like total assets), and the unit of measure (USD for monetary values, shares for share counts, pure for ratios). A complete XBRL fact is the combination of concept + entity + period + unit + value. The period dimension is what makes it possible to build time series for any company across all of its filings, and the concept dimension is what makes cross-sectional screening—all companies reporting a given concept in a given period—possible.

Common concepts used for financial screening and factor construction include:

  • Income statement: us-gaap:Revenues or us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax(post-ASC 606), us-gaap:GrossProfit, us-gaap:OperatingIncomeLoss,us-gaap:NetIncomeLoss, us-gaap:EarningsPerShareBasic, us-gaap:WeightedAverageNumberOfSharesOutstandingBasic.
  • Balance sheet: us-gaap:Assets, us-gaap:Liabilities, us-gaap:StockholdersEquity, us-gaap:LongTermDebt, us-gaap:Goodwill, us-gaap:CashAndCashEquivalentsAtCarryingValue, us-gaap:AccountsReceivableNetCurrent.
  • Cash flow: us-gaap:NetCashProvidedByUsedInOperatingActivities, us-gaap:CapitalExpendituresIncurredButNotYetPaid, us-gaap:PaymentsToAcquirePropertyPlantAndEquipment, us-gaap:ShareBasedCompensation.

The EDGAR Company Facts API

The Company Facts API provides per-company JSON at a stable endpoint:

https://data.sec.gov/api/xbrl/companyfacts/CIK{010-digit-CIK}.json

The CIK must be zero-padded to ten digits. For example, Apple Inc. (CIK 320193) is accessed at CIK0000320193.json. The response is a JSON document with a facts key containing all XBRL facts the company has ever reported, organized by namespace and then by concept name. Under each concept is a units object whose keys are unit strings (USD, shares, etc.), each containing an array of fact objects. Each fact object has the accession number, filing form type, period start and end dates (or just end date for instants), and the reported value.

The same endpoint at a higher level, the company filing index, is:

https://data.sec.gov/submissions/CIK{010-digit-CIK}.json

The Submissions endpoint returns all recent filings—10-K, 10-Q, 8-K, DEF 14A, Form 4, and others—with their accession numbers, filing dates, and form types. For companies with long filing histories, older filings are paginated into separate files listed in the files array of the response. This endpoint is the starting point for building a complete filing inventory for any public company.

The frames endpoint: cross-sectional screening

The most powerful tool in the EDGAR API for quantitative work is the frames endpoint. Rather than retrieving one company's history for a concept, it returns every company's reported value for one concept in one specific period:

https://data.sec.gov/api/xbrl/frames/us-gaap/{concept}/{unit}/{period}.json

The period format uses a calendar year code. Duration concepts (income statement items covering a span of time) use CY{year}Q{n} for quarterly periods or CY{year} for annual. Instant concepts (balance sheet items reported at a single date) append an I suffix: CY{year}Q{n}I. So us-gaap/Revenues/USD/CY2024Q4.json returns every company that reported quarterly revenues in Q4 2024 under the Revenues concept.

The response contains an array named data where each row is a six-element array: accession number, CIK, entity name, location code, period end date, and the reported value. A single frames request for a high-coverage concept like Revenues or Assets returns data for thousands of companies in a single HTTP call. This makes the frames endpoint the most efficient path for cross-sectional factor construction: one request per concept per period yields a complete cross-section that would require thousands of individual company requests via the Company Facts API.

Note that not all companies report under the same top-level concept for revenue. An insurer might use us-gaap:Revenues while a technology company that adopted ASC 606 reports under us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax. A comprehensive revenue screen typically unions results from both concepts and deduplicates by CIK.

The bulk FSN dataset

For offline analysis or environments where API rate limits are constraining, the SEC publishes the complete EDGAR XBRL dataset as quarterly zip files through its Division of Economic and Risk Analysis (DERA) data library at www.sec.gov/dera/data/financial-statements. The FSN dataset is also accessible via the EDGAR full-text search infrastructure at efts.sec.gov.

Each quarterly FSN zip contains four pipe-delimited text files corresponding to the four financial statements: num.tsv (the numeric fact values), sub.tsv (submission metadata—one row per filing), tag.tsv (concept definitions from the taxonomy), and pre.tsv (the presentation structure showing how concepts are organized in the company's filing). The num.tsv file is the primary working dataset: each row is one XBRL fact with columns for accession number, tag name, version, coreg (co-registration flag), ddate (period end date), qtrs (number of quarters the period covers), uom (unit of measure), value, and footnote.

The FSN bulk dataset includes facts from the notes to financial statements in addition to the face of the financial statements themselves. Companies are required to tag disaggregated revenue, segment data, lease disclosures, and other note-level detail under specific US-GAAP concepts. This note-level data—which the API also returns—is what makes it possible to build screens for specific accounting patterns like operating lease liabilities as a percentage of total assets, or deferred revenue growth as a revenue quality signal.

The EDGAR full-text search API at efts.sec.gov/LATEST/search-index complements the structured XBRL data by supporting keyword search across the prose of all EDGAR filings. A query like q="goodwill impairment"&dateRange=custom&startdt=2024-01-01&enddt=2024-12-31 returns all filings mentioning goodwill impairment in 2024, useful for identifying companies that disclosed impairment charges in 10-K narratives before the structured XBRL tags are indexed.

Data quality issues

Working with EDGAR XBRL data requires navigating three classes of data quality issues that affect virtually every large-scale use case.

XBRL extension elements

The US-GAAP taxonomy does not cover every line item that every company reports. Companies are permitted to create custom extension elements—XBRL concept names under their own namespace—for items that have no direct taxonomy equivalent. A company might report its primary revenue line under an extension element like xyz:NetworkAccessRevenue rather than the standard us-gaap:Revenues. Extension elements do not appear in the frames endpoint (which only covers standard taxonomy concepts) and reduce the coverage of cross-sectional screens. The prevalence of extensions is highest in regulated industries—banking, insurance, utilities—where GAAP reporting requirements diverge most from the standard taxonomy structure.

Restated periods

When a company restates previously reported financial results, it files an amended 10-K/A or 10-Q/A. The EDGAR XBRL data will then contain two or more facts for the same concept, entity, and period—one from the original filing and one from the amendment. Selecting the correct value requires filtering by the accession number's filed date and keeping the most recently filed value for each concept-entity-period combination. Failure to handle restatements correctly produces inflated or duplicated observations that corrupt time-series and cross-sectional analyses. The filed field in the Company Facts API fact objects provides the filing date needed to implement this deduplication.

Unit inconsistencies

US GAAP financial statements allow companies to report in any unit provided the unit is disclosed. Most large companies report in thousands of dollars; some report in millions; a few report in individual dollars. The XBRL submission is supposed to reflect the actual unit, but errors occur. A company reporting in millions that incorrectly tags its XBRL in thousands will appear to have revenue three orders of magnitude below its actual figure. Detecting unit errors requires cross-referencing the XBRL values against the company's disclosed reporting unit in its dei:EntityFilerCategory and looking for values that are statistical outliers relative to peers of similar size. For any large-scale analysis, a unit validation step—comparing XBRL values to the values that appear in the face of the HTML financial statements—is advisable before using the data for investment or research conclusions.

Rate limits and access policy

The SEC's rate limit for all EDGAR data APIs is ten requests per second from a single IP address. Exceeding this limit results in HTTP 429 responses and temporary blocking. All automated requests must include a descriptive User-Agent header containing the requester's name and email address—for example, User-Agent: Research Project research@example.com. Requests without a descriptive User-Agent may be blocked. For high-volume work, the SEC encourages downloading the bulk FSN quarterly zip files rather than making individual API calls, as bulk downloads are more efficient from the SEC's infrastructure perspective and avoid rate limit constraints.

Python: cross-sectional revenue screen with year-over-year growth

The script below uses the frames endpoint to fetch quarterly revenue data for all reporting companies across two consecutive years, filters to companies above $1 billion in revenue, computes year-over-year growth, and ranks the top growers. The frames approach requires two API calls regardless of how many thousands of companies report the concept, making it far more efficient than per-company Company Facts calls for cross-sectional work.

import requests
import json
from typing import Optional

HEADERS = {"User-Agent": "research@example.com"}
BASE = "https://data.sec.gov"

# --- Step 1: fetch all companies reporting us-gaap:Revenues in a given quarter ---
# The "frames" endpoint returns every company's reported value for one concept
# in one period. CY2024Q4I means calendar year 2024, Q4, instant (point-in-time).
# For flow concepts like Revenues use CY{year}Q{n} (duration), not the I suffix.

def fetch_revenue_frame(year: int, quarter: int) -> list[dict]:
    """Return all companies that reported us-gaap:Revenues for a given quarter."""
    period = "CY" + str(year) + "Q" + str(quarter)
    url = BASE + "/api/xbrl/frames/us-gaap/Revenues/USD/" + period + ".json"
    r = requests.get(url, headers=HEADERS, timeout=30)
    if r.status_code == 404:
        return []
    r.raise_for_status()
    data = r.json()
    # data["data"] is a list of [accn, cik, entityName, loc, end, val]
    results = []
    for row in data.get("data", []):
        accn, cik, entity_name, loc, end, val = row
        results.append({
            "accn": accn,
            "cik": str(cik).zfill(10),
            "entity_name": entity_name,
            "period_end": end,
            "revenue_usd": val,
        })
    return results

# --- Step 2: fetch company facts for a specific company (for YoY comparison) ---

def fetch_company_revenues(cik_padded: str) -> list[dict]:
    """Return all reported us-gaap:Revenues facts for a company across all filings."""
    url = BASE + "/api/xbrl/companyfacts/CIK" + cik_padded + ".json"
    r = requests.get(url, headers=HEADERS, timeout=30)
    r.raise_for_status()
    data = r.json()

    facts = (data
             .get("facts", {})
             .get("us-gaap", {})
             .get("Revenues", {})
             .get("units", {})
             .get("USD", []))

    # Keep only annual (10-K) or quarterly (10-Q) form filings with a 12-month
    # or 3-month duration; exclude instantaneous and amended filings.
    quarterly = [
        f for f in facts
        if f.get("form") in ("10-Q", "10-K")
        and f.get("frame", "").startswith("CY")
    ]
    return quarterly

# --- Step 3: cross-sectional screen ---
# Fetch Q4 revenue for two consecutive years, compute YoY growth, find top growers.

BILLION = 1_000_000_000
MIN_REVENUE = 1 * BILLION   # only companies reporting at least $1B

print("Fetching Q4 2023 revenue frame...")
rev_2023 = fetch_revenue_frame(2023, 4)
print("Fetching Q4 2024 revenue frame...")
rev_2024 = fetch_revenue_frame(2024, 4)

# Build lookup: cik -> revenue for each year
rev_map_2023 = {r["cik"]: r["revenue_usd"] for r in rev_2023}
rev_map_2024 = {r["cik"]: r for r in rev_2024}

# Filter to companies above $1B in 2024 that also reported in 2023
results = []
for cik, rec in rev_map_2024.items():
    rev_cur = rec["revenue_usd"]
    if rev_cur < MIN_REVENUE:
        continue
    rev_prior = rev_map_2023.get(cik)
    if rev_prior is None or rev_prior <= 0:
        continue
    growth = (rev_cur - rev_prior) / rev_prior
    results.append({
        "cik": cik,
        "entity_name": rec["entity_name"],
        "revenue_2024": rev_cur,
        "revenue_2023": rev_prior,
        "yoy_growth": growth,
    })

results.sort(key=lambda x: x["yoy_growth"], reverse=True)

print("")
print("Top 10 revenue growers (>$1B, Q4 YoY, 2023-2024):")
print("-" * 72)
header = "Company".ljust(36) + "Rev 2024 ($B)".rjust(14) + "YoY Growth".rjust(12)
print(header)
print("-" * 72)
for rec in results[:10]:
    name   = rec["entity_name"][:35].ljust(36)
    rev_b  = str(round(rec["revenue_2024"] / BILLION, 2)).rjust(14)
    growth = (str(round(rec["yoy_growth"] * 100, 1)) + "%").rjust(12)
    print(name + rev_b + growth)

The script unions only the us-gaap:Revenues concept. A production implementation would also fetch us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax (the ASC 606 replacement concept adopted by most companies after 2018) and union both results, deduplicating by CIK so each company appears once. The YoY computation uses the frames data directly; for more precise period matching, the Company Facts API provides individual filing-level period dates that allow exact alignment of Q4 to Q4 based on the reported end date rather than the calendar year code.

What you can build with EDGAR financial data

The EDGAR XBRL dataset is the foundation for a range of quantitative applications that previously required expensive commercial data licenses.

Factor investing. Value factors (price-to-book, price-to-earnings, enterprise value-to-EBITDA) and quality factors (return on equity, accruals ratio, gross profit margin) can be constructed from EDGAR XBRL data across the full US public equity universe. The frames endpoint for each balance sheet and income statement concept provides the cross-sectional data needed for monthly factor portfolio rebalancing. The debt-to-equity ratio—us-gaap:LongTermDebt divided by us-gaap:StockholdersEquity—is computable from two frames requests. Revenue growth and gross margin trends require three quarters of data from the Company Facts API or the FSN bulk dataset.

Earnings quality signals. The accruals ratio (net income minus operating cash flow, scaled by average total assets) is a documented predictor of future earnings disappointments; high accruals suggest aggressive revenue recognition or expense deferral. Stock-based compensation as a percentage of revenue —us-gaap:ShareBasedCompensation from the cash flow statement, normalized by us-gaap:Revenues—is a standard earnings quality adjustment in technology sector analysis. Deferred revenue changes (an increase in deferred revenue suggests demand strength; a decrease may signal customer attrition) are available under us-gaap:DeferredRevenueCurrent.

Goodwill tracking. Goodwill accumulation through acquisitions and subsequent impairment write-downs are fully captured in the XBRL data. A company whose goodwill grows from $2 billion to $8 billion over four years has made significant acquisitions; the subsequent appearance of Item 2.06 impairment 8-K filings or us-gaap:GoodwillImpairmentLoss facts signals that those acquisitions have underperformed. Journalists covering M&A outcomes and academic researchers studying acquisition performance regularly use EDGAR XBRL goodwill data.

Financial distress screening. The combination of declining revenue, increasing long-term debt, negative operating cash flow, and rapidly shrinking stockholders' equity—all available from standard XBRL concepts—approximates the inputs to the Altman Z-score and similar distress prediction models. Running these screens across the full EDGAR universe identifies companies approaching financial difficulty months before bankruptcy filings appear in the 8-K feed.

Government contractor financials. For companies where a significant portion of revenue comes from federal contracts, the EDGAR XBRL data provides the financial picture while the USASpending.gov transaction database provides the underlying contract details. Matching companies across both datasets by name and tax identification number produces a combined view of federal revenue concentration and overall financial health that neither dataset provides alone.

Academic research. Academic finance and accounting researchers have used EDGAR XBRL data extensively since the mandate took effect. Studies of earnings management, revenue recognition changes following ASC 606 adoption, the effect of mandatory XBRL disclosure on analyst forecast accuracy, and the information content of segment reporting all rely on the EDGAR XBRL data as their primary source of financial statement information for US public companies. The data's machine-readable structure, consistent format, and complete coverage of the US public company universe make it the de facto standard for large-sample empirical accounting research.

Cross-referencing other EDGAR datasets

The EDGAR XBRL financial data is most useful in combination with other SEC datasets. The Submissions API provides filing dates and form types that allow the financial data to be linked to specific 10-K and 10-Q filings. The Form 4 insider trading dataset provides transaction-level data on what company insiders were doing with their personal holdings around the same periods reflected in the financial statements—a connection that researchers use to study whether insider buying anticipates earnings surprises. The Form 8-K dataset, particularly Item 2.06 impairment announcements and Item 4.02 non-reliance filings, provides event-level context for interpreting changes in the financial statement data. The CIK is the join key across all EDGAR datasets, enabling a unified view of any public company across its filings, financials, insider transactions, and material event disclosures.


For SEC Form 4 insider trading data—the transaction-level record of every open-market purchase and sale by corporate officers, directors, and ten-percent owners, filed within two business days of each transaction and directly linkable to financial statement data via CIK: SEC Form 4: the public record of every insider stock transaction →

For SEC Form 8-K material event disclosures—Item 2.06 impairment announcements, Item 4.02 non-reliance filings, and the December 2023 cybersecurity disclosure rule that created a new real-time feed of material breach disclosures across all public filers: SEC Form 8-K: the real-time disclosure feed for every material corporate event →

For SEC enforcement actions—the complete public record of SEC civil and administrative proceedings, including accounting fraud, revenue recognition violations, and XBRL disclosure failures, with case-level data going back decades: SEC enforcement actions: building a fraud signal from public case data →