Technical writing

DOL Form 5500: The Federal Database Behind Every US Pension and Benefit Plan

· 18 min read· AI Analytics
DOLForm 5500PensionsRetirement BenefitsFederal Data

Every major private-sector employee benefit plan in the United States is required to file a Form 5500 Annual Return/Report with the Department of Labor. More than 217,000 large-plan filings arrive each year — representing defined benefit pensions, 401(k) and 403(b) plans, profit-sharing plans, ESOPs, and health and welfare plans — collectively disclosing the financial condition of benefit programs covering tens of millions of workers and retirees. The assets on record exceed $30 trillion across all ERISA-covered plans. Every filing is public record, searchable through the DOL's EFAST2 system and available as bulk structured datasets going back to 2009. Almost nobody outside ERISA litigation and federal enforcement uses the data systematically.

This article covers the origins and statutory basis of Form 5500, the plan types it covers, the modular schedule architecture that makes each filing a layered financial disclosure, the PBGC insurance system for defined benefit plans, the fee-transparency mechanics of Schedule C and their role in ERISA litigation, the funding rules that govern defined benefit plans and the Schedule SB actuarial disclosures, the large-plan audit requirement and its enforcement history, how to access the bulk data through EFAST2 and the DOL's research datasets, and a Python example for downloading the annual files and analyzing plan asset concentration and administrative fee rates by plan size tier.

Origins and statutory basis

Form 5500 was created by Congress as part of the Employee Retirement Income Security Act of 1974 — ERISA — the landmark statute that established federal oversight of private-sector employee benefit plans for the first time. Before ERISA, there was no requirement that pension plans be funded adequately, no federal insurance for pension benefits, no minimum vesting standards, and no requirement to disclose plan finances to participants or the government. A worker could spend thirty years with an employer, be promised a pension, and find on retirement that the pension fund had been mismanaged into insolvency. The Studebaker Corporation pension collapse of 1963 — which left 4,000 workers with little or nothing after the automaker closed its South Bend plant — was the catalyzing event that produced a decade of congressional deliberation and ultimately ERISA.

ERISA is jointly administered by three agencies: the Department of Labor (which enforces the fiduciary and reporting requirements), the Internal Revenue Service (which enforces the tax qualification rules), and the Pension Benefit Guaranty Corporation (which insures defined benefit plan benefits). Form 5500 is the annual reporting instrument through which all three agencies receive the data they need to fulfill their oversight functions. The SECURE Act of 2019 and the SECURE 2.0 Act of 2022 amended ERISA and the Internal Revenue Code in various ways affecting what Form 5500 captures, including new rules for Pooled Employer Plans and expanded coverage of part-time workers.

The filing threshold is 100 participants at the beginning of the plan year. Plans above that threshold file the full Form 5500 with all applicable schedules and must engage an independent CPA to audit the plan's financial statements. Plans with fewer than 100 participants file the simplified Form 5500-SF (short form). Plans with fewer than 25 participants that meet additional eligibility criteria may file Form 5500-EZ, which is not publicly disclosed. The 100-participant threshold and the large plan audit requirement together define the boundary of the publicly searchable record.

Plan types and ERISA structure

ERISA recognizes two broad categories of pension plan, each with fundamentally different economic structures: defined benefit plans and defined contribution plans. Understanding the distinction is essential for interpreting Form 5500 data correctly, because the schedules, the funding rules, and the disclosure requirements differ substantially between them.

Defined benefit plans promise a specific monthly benefit at retirement, calculated by a formula that typically incorporates years of service and a measure of career earnings. A common formula: 1.5 percent of high-three-year average salary multiplied by years of service, so that a worker with 30 years of service and a high-three average salary of $80,000 would receive 45 percent of that salary — $36,000 per year — as a lifetime annuity regardless of investment returns. The employer bears the investment risk and the longevity risk. Assets are held in a trust and managed by professional investment managers; the employer must contribute enough to keep the plan actuarially funded. PBGC insurance backstops the benefit if the plan terminates underfunded.

DB plan coverage has contracted substantially since the mid-1980s. Active participants in private-sector DB plans peaked at approximately 27 million in 1985. By 2022 that figure had fallen below 13 million, reflecting a decades-long shift to defined contribution plans that transfers investment and longevity risk from employers to employees. The decline is concentrated in the private sector; state and local government plans, which are not covered by ERISA and do not file Form 5500, have held DB coverage more stable.

Defined contribution plans promise only that the employer will make specified contributions to individual participant accounts; investment returns determine the final balance. The 401(k) — named for the Internal Revenue Code section authorizing it — is the dominant form: employees elect to defer a portion of salary on a pre-tax basis, subject to annual IRS limits ($23,000 employee deferral in 2024, $69,000 total including employer contributions, $7,500 additional catch-up for participants age 50 and older). Employers typically match some portion of employee deferrals to incentivize participation. Participants direct their own investments from a menu of options — typically target-date funds, index funds, and sometimes company stock.

403(b) plans are the nonprofit and public education equivalent of 401(k) plans, named for a different IRC section, and operate under similar rules with some historical differences in investment vehicle eligibility. Profit-sharing plans permit employer contributions that vary with company profits, without requiring employee salary deferral. ESOPs — Employee Stock Ownership Plans — hold primarily employer company stock; major ESOP-owned companies include Publix Super Markets, WinCo Foods, and W.L. Gore & Associates. Multiple Employer Plans (MEPs) and the newer Pooled Employer Plans (PEPs) authorized by SECURE allow unrelated small employers to pool administrative resources under a single Form 5500 filing, reducing compliance costs.

Health and welfare plans — group health insurance, dental, vision, life, and disability coverage — also file Form 5500 if they have 100 or more participants, but use different schedules and present different analytical opportunities than pension plans. The financial data for welfare plans is less granular than for pension plans, and the PBGC insurance framework does not apply. This article focuses primarily on pension plans, where Form 5500 data is most actionable for research.

Form 5500 schedule architecture

Form 5500 is a modular filing system. The core form captures plan identification, plan characteristics (coded checkboxes for plan type and benefit features), administrator contact information, and a summary of basic financial data. Schedules attach to the core form depending on plan type, plan size, and the nature of plan assets and arrangements. Each schedule is a separate file in the DOL's bulk research datasets, enabling targeted analysis without downloading the full filing. The key schedules are:

Schedule A — Insurance Information. Required for plans using insurance contracts to fund or hold plan assets. Discloses the carrier, contract number, premiums paid, benefit payments made by the insurer, and commissions paid to agents or brokers. For health and welfare plans using fully-insured arrangements, Schedule A captures the full economic relationship between the plan and the carrier. For defined contribution plans using group annuity contracts (common in 403(b) plans), Schedule A reports the annuity premiums and contract values.

Schedule C — Service Provider Information. The most forensically significant schedule for fee analysis. Required for large plans, Schedule C must list every service provider receiving $5,000 or more in direct or indirect compensation from the plan during the plan year. Covered providers include recordkeepers, third-party administrators, investment managers, brokers, consultants, actuaries, and auditors. For each provider, the schedule reports the services rendered and compensation paid — broken out between direct compensation (paid explicitly from plan assets) and indirect compensation (paid by third parties, typically mutual fund companies via revenue sharing, 12b-1 fees, sub-transfer agent fees, or float income). Schedule C is the primary data source for identifying excessive fee arrangements and undisclosed compensation conflicts.

Schedule D — DFE Information. Used when a plan invests in a Direct Filing Entity such as a master trust, common/collective trust, pooled separate account, or 103-12 investment entity. DFEs file their own Form 5500 reporting the combined assets; participating plans report their share on Schedule D. Large pension funds routinely pool assets in master trusts, so Schedule D is the mechanism connecting the plan-level filing to the underlying investment vehicle.

Schedule G — Financial Transaction Schedules. Reports reportable transactions (large asset purchases or sales), nonexempt party-in-interest transactions, and prohibited transactions. A prohibited transaction — for example, a plan lending money to the plan sponsor or a fiduciary causing the plan to pay excessive compensation to a related party — triggers excise taxes and potential DOL enforcement. Schedule G disclosures are a red flag for fiduciary misconduct.

Schedule H — Financial Information (Large Plans). The full balance sheet and income statement for plans with 100 or more participants. Schedule H captures total plan assets at beginning and end of year (at fair market value), total liabilities, net assets, contributions received, benefit payments, investment gains and losses broken out by asset class, administrative expenses, and transfers. The investment detail in Schedule H — employer securities, real estate, partnerships, registered investment companies, brokerage accounts, and other categories — enables asset allocation analysis across the entire large-plan universe.

Schedule I — Financial Information (Small Plans). The condensed equivalent of Schedule H for plans with fewer than 100 participants. Contains beginning and ending total assets, contributions, and benefits paid, without the full investment detail of Schedule H.

Schedule R — Retirement Plan Information. Captures plan design features including the funding method, actuarial cost method, the identity of the actuary, and for multiemployer plans, withdrawal liability information. Also reports minimum required distributions, in-service distributions, and whether the plan is maintained in a US territory.

Schedule SB — Actuarial Information (Single-Employer DB Plans). The most analytically rich schedule for defined benefit analysis. Reports the actuarial present value of all accrued benefit liabilities (the funding target), the actuarial value and fair market value of plan assets, the adjusted funding target attainment percentage (AFTAP), the funding shortfall or surplus, the minimum required contribution under IRC §430, and the actuarial assumptions used — discount rates, mortality tables, salary scale, and turnover. Schedule SB is filed by the plan's enrolled actuary, who must sign the form under penalty of perjury.

Schedule MB — Actuarial Information (Multiemployer Plans). The equivalent of Schedule SB for multiemployer defined benefit plans, which are maintained under collective bargaining agreements covering workers employed by multiple contributing employers in a given industry. Schedule MB reports the plan's funding ratio, actuarial assumptions, and — critically — whether the plan is in “endangered” (yellow zone, funded below 80 percent) or “critical” (red zone, funded below 65 percent) status under the Pension Protection Act of 2006. Plans in critical and declining status — funded below 65 percent and projected to become insolvent — face additional disclosure requirements and may pursue benefit cuts or plan mergers.

The PBGC and defined benefit plan insurance

The Pension Benefit Guaranty Corporation was established by ERISA as a federal insurance program for private defined benefit plans. When a covered plan terminates with insufficient assets to pay all promised benefits — a distress termination or a plan termination by PBGC following the plan sponsor's bankruptcy — the PBGC becomes trustee of the plan and pays benefits up to the statutory guarantee limits. For 2024, the maximum annual benefit guaranteed by the PBGC is approximately $80,000 per year at age 65, indexed annually; the guarantee is lower for early retirement and for participants in multiemployer plans.

PBGC is funded by insurance premiums from plan sponsors rather than by congressional appropriations. The premium structure has two components: a flat-rate premium ($96 per participant in 2024, indexed) and a variable-rate premium assessed on underfunding ($52 per $1,000 of unfunded vested benefits in 2024, capped per plan). Plans that are well-funded pay only the flat-rate premium; underfunded plans pay both, creating a direct financial incentive to maintain adequate funding.

PBGC administers two separate insurance programs with distinct financial positions. The single-employer program, covering plans sponsored by individual corporations, has been financially sound in recent years, reporting a net financial position exceeding $40 billion in 2023 — a dramatic reversal from earlier decades when major airline and steel company pension terminations threatened the program's solvency. The multiemployer program, covering collectively bargained plans across industries like trucking, mining, and construction, has been chronically underfunded.

The multiemployer crisis peaked in the late 2010s, when approximately 200 plans holding hundreds of billions in liabilities were classified as “critical and declining” — meaning they were projected to become insolvent within 20 years or, in many cases, within a decade. The Central States Pension Fund, the largest troubled plan covering current and former Teamsters, faced projected insolvency within years. The American Rescue Plan Act of 2021 created the Special Financial Assistance (SFA) program, providing approximately $86 billion in federal grants to troubled multiemployer plans to restore solvency through 2051. Central States received approximately $73 billion in SFA — effectively a federal pension bailout that averted benefit cuts for roughly two to three million workers and retirees. The SFA program transformed the multiemployer program's financial outlook, though PBGC projects the program may again face stress as demographic realities in declining unionized industries play out over decades.

Schedule C and fee transparency

Schedule C is the instrument through which Congress and the DOL force 401(k) plans to disclose what they actually pay for plan administration — including the indirect compensation that was largely invisible to plan sponsors before the DOL strengthened disclosure requirements in 2012 under ERISA Section 408(b)(2). The final 408(b)(2) regulations require covered service providers — recordkeepers, investment advisers, and others receiving $1,000 or more in indirect compensation — to disclose their compensation to plan fiduciaries before the plan enters the service arrangement. Schedule C then requires the plan to report this compensation to the DOL annually, creating a public record.

The mechanics of indirect compensation are central to why Schedule C matters. A 401(k) plan's recordkeeper may charge no explicit fee to the plan while receiving 25 to 50 basis points annually from the mutual fund companies whose funds appear on the plan's investment menu, in exchange for shareholder recordkeeping services. This revenue sharing reduces the service cost to the plan sponsor (and in some cases appears to be zero-cost administration) while causing participants to pay higher expense ratios than they would if the plan used institutional-class fund shares and paid an explicit recordkeeping fee. The arrangement is not prohibited but creates a conflict of interest: the recordkeeper has an incentive to include funds that pay higher revenue sharing rather than funds that are best for participants.

The plaintiffs' bar has used Schedule C data extensively in ERISA fiduciary duty litigation. Under ERISA Section 404(a), plan fiduciaries must act in the sole interest of participants, with the care of a prudent expert. Courts have held that this requires fiduciaries to monitor plan fees and take action when fees are excessive relative to the services received. The Supreme Court confirmed in Tibble v. Edison International (2015) that the fiduciary duty of prudence is continuing — fiduciaries must periodically review plan investments and service arrangements, not merely exercise prudence at the time of initial selection. Schedule C provides the documentary foundation for proving that a plan paid above-market fees: attorneys compare the target plan's Schedule C to those of peer plans of similar size and asset level to establish the range of reasonable compensation for equivalent services. Major ERISA excessive-fee class actions settled against Boeing ($57 million), Intel ($24 million), MIT ($18 million), Cornell University ($10.75 million), Stanford ($12 million), and Emory University ($17 million) all drew on Schedule C comparisons as part of their evidentiary foundation.

DB plan funding rules and Schedule SB

The funding rules governing single-employer defined benefit plans are set out in IRC Section 430 (the parallel ERISA provision is Section 303), enacted by the Pension Protection Act of 2006. The rules require plan sponsors to make annual minimum required contributions (MRCs) calculated through actuarial valuation. The valuation computes the present value of all accrued benefit liabilities — the funding target — using interest rates derived from investment-grade corporate bond yield curves specified by the IRS. The rates are published in three segments: a short-term rate (for benefits payable within five years), a mid-term rate (five to twenty years), and a long-term rate (beyond twenty years). Legislation has repeatedly smoothed these rates to reduce the volatility of required contributions during periods of low interest rates, including the Highways and Transportation Funding Act (HATFA) of 2014 and the Bipartisan Budget Acts of 2015 and 2018.

The adjusted funding target attainment percentage (AFTAP) is the central metric driving benefit restriction triggers. AFTAP equals adjusted plan assets divided by the adjusted funding target, each calculated using the corridor-smoothed interest rates. If AFTAP falls below 80 percent, the plan is restricted from making certain plan amendments that would increase benefits. Below 60 percent, benefit accruals must cease and lump-sum distributions are prohibited. These restrictions are automatic — they apply by operation of law when the AFTAP drops below the threshold, and the plan administrator must notify participants. Schedule SB discloses the AFTAP, the underlying asset and liability values, and the actuarial assumptions, enabling outside verification of the plan's compliance with the restriction thresholds.

Plans in “at-risk” status — those with a funding target attainment percentage below 80 percent using the at-risk assumptions — must compute their funding target using more conservative actuarial assumptions that typically increase liabilities, requiring higher minimum contributions. For multiemployer plans under the Pension Protection Act, the analogous zones (endangered at below 80 percent funded, critical at below 65 percent) trigger rehabilitation plan requirements and restrictions on benefit improvements, with the specific rules disclosed on Schedule MB.

Withdrawal liability is the multiemployer plan counterpart to the single-employer minimum required contribution. When an employer withdraws from a multiemployer plan — either completely (closing all covered operations) or partially (reducing covered employment below a threshold) — it becomes liable for its proportionate share of the plan's unfunded vested benefits. Withdrawal liability calculations are complex actuarial determinations disclosed on Schedule MB and often become the subject of arbitration between the withdrawing employer and the plan trustees. For companies in industries with large multiemployer plan exposure — grocery, trucking, construction — withdrawal liability can be a material contingent liability that is difficult to assess from public financial statements alone. Schedule MB provides the actuarial inputs.

The large plan audit requirement

Plans with 100 or more participants at the beginning of the plan year must engage an independent qualified public accountant to audit the plan's financial statements and attach the audit report to the Form 5500 filing. The audit requirement is intended to provide an independent check on the accuracy of the financial information disclosed in Schedule H and to detect errors, omissions, and potential fraud. The Department of Labor has authority to assess civil penalties of $250 per day (up to $150,000 per filing) for failure to file a required audit report, and higher penalties for willful violations.

The auditing standards applicable to ERISA plan audits were significantly updated in 2021 when the AICPA issued Statement on Auditing Standards No. 136, “Forming an Opinion and Reporting on Financial Statements of Employee Benefit Plans Subject to ERISA.” SAS 136 strengthened requirements for auditor engagement, communication with plan management and those charged with governance, and the content of the audit report. Specifically, it replaced the former “limited-scope audit” (in which auditors were permitted to disclaim an opinion on investment information certified by a bank or insurance carrier) with a more rigorous “ERISA Section 103(a)(3)(C) audit” that still permits reliance on certified investment information but requires the auditor to form and express an opinion on the financial statements as a whole.

The DOL's Office of Inspector General has repeatedly found significant deficiencies in the quality of plan audits. A 2015 OIG study found that 39 percent of plan audits examined had major deficiencies that put $653 billion in plan assets at risk. The DOL has responded with audit quality enforcement initiatives targeting accounting firms with poor track records in plan auditing, and with civil penalty assessments against plans with deficient audit reports. The audit firm's identity is disclosed on the Form 5500, enabling systematic analysis of audit quality by firm — a capability the DOL uses internally and that journalists and researchers can replicate using the public dataset.

EFAST2 and bulk data access

EFAST2 — ERISA Filing Acceptance System 2 — is the electronic filing and public disclosure platform operated by the DOL. All Form 5500 filings since 2010 have been submitted and disclosed through EFAST2. The public search interface at efast.dol.gov/5500Search/ allows searching by employer identification number, plan name, or plan year. Search results return the PDF filing and, for recent years, structured XML. The portal is adequate for individual plan lookups but not for population-level analysis.

For bulk research, the DOL's Employee Benefits Security Administration publishes annual Form 5500 research datasets at dol.gov/agencies/ebsa … /5500-datasets. The files are hosted at askebsa.dol.gov as compressed CSV archives — one file per schedule per plan year. The main filing table (F_5500) has one row per plan filing; Schedule H (F_SCH_H), Schedule C (F_SCH_C), Schedule A (F_SCH_A), Schedule SB (F_SCH_SB), Schedule MB (F_SCH_MB), and others are joinable on the acknowledgment ID. No API key is required. Data is available from plan year 2009 onward, typically published six to nine months after the calendar plan year ends. Plans filing on extension may not appear in the initial annual release.

Commercial providers have built richer analytical layers on top of the Form 5500 data. BrightScope, acquired by PGIM in 2016, operates a database that cross-references Form 5500 filings with mutual fund holdings data to compute plan-level expense ratios net of revenue sharing. ProPublica has indexed Form 5500 data for specific analyses of fee concentration and fiduciary outcomes. The IRS also publishes Form 5500 research files with additional EIN-level panel data useful for longitudinal analysis. For regulatory enforcement purposes, DOL investigators use EFAST2 data to identify plans warranting audit, track delinquent filers, and prioritize enforcement resources based on plan size, industry, and financial characteristics.

Python: downloading Form 5500 data and analyzing asset concentration

The following script downloads the DOL Form 5500 annual research files for a given plan year, filters to defined contribution plans with a 401(k) feature and at least 1,000 participants, ranks the top-50 largest plans by total plan assets, computes the distribution of all 401(k) plans across five asset-size tiers, and joins Schedule C data to compute median administrative fee rates in basis points by plan asset tier. Files are downloaded directly from askebsa.dol.gov. No credentials are required.

import requests
import csv
import io
import zipfile
import collections

# -----------------------------------------------------------------------
# DOL Form 5500 Research Files
# Dataset page:
#   https://www.dol.gov/agencies/ebsa/employers-and-advisers/plan-administration-and-compliance/reporting-and-filing/form-5500/5500-datasets
# Direct download base (hosted by askebsa.dol.gov):
#   https://askebsa.dol.gov/FOIA%20Files/<YEAR>/
# Files use latin-1 encoding. No API key required.
# -----------------------------------------------------------------------

YEAR = 2023
BASE = f"https://askebsa.dol.gov/FOIA%20Files/{YEAR}/"

def fetch_zip_csv(filename):
    """Download a zipped CSV from the DOL FOIA server and return rows as dicts."""
    url = BASE + filename
    print(f"Downloading {url} ...")
    r = requests.get(url, timeout=180)
    r.raise_for_status()
    z = zipfile.ZipFile(io.BytesIO(r.content))
    inner = z.namelist()[0]
    with z.open(inner) as f:
        reader = csv.DictReader(io.TextIOWrapper(f, encoding="latin-1"))
        return list(reader)

def safe_float(val):
    try:
        return float(val) if val else None
    except (ValueError, TypeError):
        return None

# -----------------------------------------------------------------------
# Part 1: Download F_5500 main filing table and filter to DC 401(k) plans
# type_pension_bnft_code contains '2C' for plans with a 401(k) feature.
# -----------------------------------------------------------------------
filings = fetch_zip_csv(f"F_5500_{YEAR}.zip")
print(f"Total filings: {len(filings):,}")

k401 = [
    row for row in filings
    if "2C" in (row.get("type_pension_bnft_code") or "")
    and safe_float(row.get("tot_partcp_eoy_cnt") or "0") is not None
    and (safe_float(row.get("tot_partcp_eoy_cnt") or "0") or 0) >= 1000
]
print(f"401(k) plans with 1,000+ participants: {len(k401):,}")

# -----------------------------------------------------------------------
# Part 2: Rank top-50 largest 401(k) plans by end-of-year total assets
# (tot_plan_asset_eoy_amt from the main F_5500 table)
# -----------------------------------------------------------------------
ranked = sorted(
    [r for r in k401 if safe_float(r.get("tot_plan_asset_eoy_amt"))],
    key=lambda r: safe_float(r["tot_plan_asset_eoy_amt"]),
    reverse=True,
)

print("\nTop 10 largest 401(k) plans by total assets:")
for i, row in enumerate(ranked[:10], 1):
    name = (row.get("plan_name") or "").strip()
    assets = safe_float(row["tot_plan_asset_eoy_amt"])
    parts = int(safe_float(row.get("tot_partcp_eoy_cnt") or "0") or 0)
    ein = row.get("ein", "")
    print(f"  {i:>2}. {name[:55]:<55} EIN={ein}  assets=${assets/1e9:.1f}B  n={parts:,}")

# -----------------------------------------------------------------------
# Part 3: Distribution of plan assets by tier (asset histogram)
# Tiers: <$1M, $1M-$10M, $10M-$100M, $100M-$1B, $1B+
# -----------------------------------------------------------------------
TIERS = [
    ("<$1M",       0,         1e6),
    ("$1M-$10M",   1e6,       10e6),
    ("$10M-$100M", 10e6,      100e6),
    ("$100M-$1B",  100e6,     1e9),
    ("$1B+",       1e9,       float("inf")),
]
tier_counts = {label: 0 for label, _, _ in TIERS}
tier_assets = {label: 0.0 for label, _, _ in TIERS}

all_k401 = [
    row for row in filings
    if "2C" in (row.get("type_pension_bnft_code") or "")
]
for row in all_k401:
    assets = safe_float(row.get("tot_plan_asset_eoy_amt"))
    if assets is None or assets < 0:
        continue
    for label, lo, hi in TIERS:
        if lo <= assets < hi:
            tier_counts[label] += 1
            tier_assets[label] += assets
            break

print("\n401(k) plan asset distribution:")
print(f"  {'Tier':<15} {'Plans':>8}  {'Total Assets':>14}")
for label, _, _ in TIERS:
    print(f"  {label:<15} {tier_counts[label]:>8,}  ${tier_assets[label]/1e9:>12.1f}B")

# -----------------------------------------------------------------------
# Part 4: Join Schedule C admin-fee data to compute fee rate by tier
# F_SCH_C has one row per service-provider per plan. We sum DIRECT_COMP_AMT
# per plan (ack_id), then join back to the main filing to compute
# fee rate = total_svc_fees / tot_plan_asset_eoy_amt (in basis points).
# -----------------------------------------------------------------------
print("\nDownloading Schedule C (service provider fees)...")
sch_c = fetch_zip_csv(f"F_SCH_C_{YEAR}.zip")
print(f"Schedule C rows: {len(sch_c):,}")

# Sum direct compensation per plan (keyed on ack_id)
plan_fees = collections.defaultdict(float)
for row in sch_c:
    amt = safe_float(row.get("direct_comp_amt"))
    if amt and amt > 0:
        plan_fees[row.get("ack_id", "")] += amt

# Build lookup from ack_id -> filing row for 401(k) plans
k401_by_ack = {row.get("ack_id", ""): row for row in all_k401}

# Compute fee rate by asset tier
tier_bps = {label: [] for label, _, _ in TIERS}
for ack_id, total_fees in plan_fees.items():
    row = k401_by_ack.get(ack_id)
    if not row:
        continue
    assets = safe_float(row.get("tot_plan_asset_eoy_amt"))
    if not assets or assets <= 0:
        continue
    bps = (total_fees / assets) * 10000  # convert to basis points
    for label, lo, hi in TIERS:
        if lo <= assets < hi:
            tier_bps[label].append(bps)
            break

print("\nMedian Schedule C admin fee rate (basis points) by plan asset tier:")
for label, _, _ in TIERS:
    vals = sorted(tier_bps[label])
    if vals:
        median_bps = vals[len(vals) // 2]
        print(f"  {label:<15} median={median_bps:.1f} bps  n={len(vals):,}")
    else:
        print(f"  {label:<15} no data")

The results from running this analysis on recent plan years consistently show the same pattern: fee rates in basis points fall sharply as plan asset size increases. Plans in the sub-$1 million asset tier typically show median Schedule C-reported admin fees exceeding 100 basis points annually; plans in the $1 billion and above tier routinely show median rates below 10 basis points. This disparity reflects recordkeeping economics (per-participant costs are largely fixed, so larger plans spread them across more assets), negotiating leverage, and the availability of institutional-class investment vehicles unavailable to smaller plans. The top-50 ranking is dominated by plans sponsored by large technology companies, financial institutions, and industrial corporations, with assets ranging from tens of billions down to a few billion depending on the year.

Practical uses and limitations

Form 5500 data supports a wide range of practical research applications beyond regulatory enforcement. Corporate finance analysts use Schedule SB data to verify and extend the pension disclosures in annual reports and 10-K filings: public companies must disclose pension obligations under ASC 715, but the actuarial assumptions and sensitivities in those disclosures are presented on the company's own terms. Schedule SB provides a standardized actuarial filing signed by an enrolled actuary under penalty of perjury, covering the same underlying plan. Discrepancies between the financial statement presentation and the Schedule SB data can reveal conservative or aggressive assumption choices.

Journalists investigating retirement security use the longitudinal Form 5500 record to document the shift from defined benefit to defined contribution coverage in specific industries, identify employers with the most severely underfunded pension obligations before they reach PBGC termination, and track the trajectory of multiemployer plans in critical status. Because the data covers the full population of large plans annually, it supports before-and-after analyses across economic cycles that would be impossible with survey or disclosure data alone.

Several practical limitations must be accounted for in any systematic analysis. Filing delays are significant: the standard filing deadline is seven months after the plan year end, and extensions of up to nine and a half months are common, meaning some plans in a given year's dataset are filing on extension and may represent conditions from eighteen months earlier. For plans below the large-plan threshold, the Form 5500-SF data is less detailed and there is no requirement for an independent audit. Actuarial assumptions in Schedule SB are disclosed but not standardized across plans, so comparing funding ratios across DB plans requires normalizing for discount rate and mortality assumption differences. Finally, the coding of plan types and benefit features has evolved across the years the dataset covers, requiring careful attention to the DOL's annual data dictionaries and technical appendices to construct longitudinal panels correctly.

Despite these limitations, Form 5500 remains the most comprehensive public source of information on private-sector employee benefit plans in the United States. No other dataset covers the full population of large plans at this level of financial detail, updated annually, accessible without credentials, and going back to 2009 in structured form. For any research touching retirement security, fiduciary conduct, plan economics, or employer benefit obligations, it is the starting point.

Related writing: BLS Occupational Employment and Wage Statistics: mapping 800 occupations across 600 geographic areas — employer compensation costs connect directly to retirement benefit obligations; OEWS wage data provides the salary baselines that defined benefit formulas and 401(k) match structures are built on.

Related writing: SSA Social Security: the federal database behind $1.4 trillion in annual OASDI benefits — ERISA pension plans interact with Social Security through FICA integration, plan design offsets, and retirement income replacement rate analysis; the OASDI benefit formula and trust fund mechanics are the public-plan counterpart to private ERISA plan funding rules.