Technical writing
DOL Form 5500: The Annual Filing That Exposes Every Private Pension and 401(k) Plan
Every ERISA-covered employee benefit plan with 100 or more participants must file a Form 5500 annually with the Department of Labor. More than 750,000 plans do so each year, collectively disclosing the finances of every major private-sector retirement program in the country — 401(k) plans, traditional pensions, profit-sharing plans, 403(b) plans, and health and welfare plans. The DOL publishes all of it through the EFAST2 electronic filing system, with structured CSV datasets going back to 2009. The total assets represented exceed $10 trillion. Almost no one outside of ERISA litigation and investigative journalism uses the data.
This article covers what Form 5500 reports, the plan types it encompasses, the key data fields available, the schedules and what each one discloses, how to access the data in bulk, the scale of the dataset, what Schedule C reveals about 401(k) fees, what Schedule SB reveals about defined-benefit funding ratios, a Python snippet for computing average expense ratios by plan size, and how ERISA attorneys and journalists use the filings to investigate excessive fees, imprudent investments, and underfunded pension obligations.
What Form 5500 covers
The Form 5500 Annual Return/Report of Employee Benefit Plan was created by Congress as part of ERISA in 1974 and is jointly administered by the Department of Labor, the Internal Revenue Service, and the Pension Benefit Guaranty Corporation. Its purpose is transparency: Congress wanted a public record of every significant private-sector benefit plan, its financial condition, and the people responsible for running it.
Three broad categories of plans file:
Pension plans include both defined-benefit plans (traditional pensions that promise a specific monthly benefit at retirement based on salary and years of service) and defined-contribution plans, the most common of which are 401(k) plans (private-sector salary-deferral plans), 403(b) plans (nonprofits and public education), profit-sharing plans, money-purchase pension plans, and employee stock ownership plans (ESOPs). Defined-contribution plans do not promise a specific benefit — they promise only that the employer will contribute to individual participant accounts. The investment risk falls entirely on the participant.
Health and welfare plans include group health insurance, dental, vision, life insurance, disability, and other non-retirement employee benefits. These plans file a Form 5500 but use different schedules than pension plans, and many of the financial fields applicable to pension plans do not apply. This article focuses primarily on pension plans, where the financial data is most actionable.
Plans with fewer than 100 participants generally file the simplified Form 5500-SF (short form) rather than the full Form 5500 with all schedules. Plans with fewer than 25 participants that meet certain conditions are exempt from filing entirely. The detailed analysis enabled by Schedules C, H, SB, and MB applies primarily to large plans — those with 100 or more participants at the beginning of the plan year.
Key data fields
The core Form 5500 filing captures identifying information and high-level plan characteristics. The fields most useful for research and analysis are:
EIN and plan number. Every filing is identified by the employer identification number of the plan sponsor combined with a three-digit plan number assigned by the sponsor. This EIN/plan-number pair is stable across years, enabling longitudinal tracking of a plan's financial trajectory. A single employer may maintain multiple plans, each with its own plan number.
Plan name and type. The plan name as the sponsor has designated it, along with a series of coded checkboxes indicating the type of plan: defined benefit, defined contribution, 401(k) feature, ESOP feature, and so on. Plans may check multiple codes — a profit-sharing plan with a 401(k) feature is common.
Participant counts. The number of active participants (employees currently accruing benefits), retired or separated participants receiving benefits, and other participants (vested former employees not yet receiving benefits). The total participant count determines whether a plan meets the 100-participant threshold requiring a full audit and the complete Form 5500 with all schedules.
Total plan assets. The fair market value of all plan assets at the end of the plan year. For defined-contribution plans, this equals the sum of all participant account balances. For defined-benefit plans, it is the market value of the trust's investment portfolio, which must be compared against the plan's liabilities to assess funding status.
Contributions received. Total employer and employee contributions during the plan year. For 401(k) plans, this includes both employer matching contributions and employee salary deferrals. For defined-benefit plans, this is the employer's funding contribution.
Benefit payments and distributions. Total amounts paid to participants and beneficiaries during the year, including retirement distributions, disability payments, death benefits, and hardship withdrawals.
Investment return. Net gain or loss from investments during the plan year, reflecting both realized and unrealized changes in portfolio value. Defined-contribution plans disaggregate this into individual account changes; the aggregate figure appears in Schedule H.
Plan administrator and service providers. The name, address, and EIN of the plan administrator (often the employer itself), the trustee, and key service providers. Schedule C contains the detailed service-provider compensation data for large plans.
Auditor. Large plans must engage an independent qualified public accountant to audit the plan's financial statements. The auditor's name and whether the audit was qualified or unqualified appears in the filing. A qualified audit opinion — meaning the auditor found exceptions — is a significant red flag.
The schedules
The Form 5500 is a modular filing system. Different schedules attach to the core form depending on plan type and size. Understanding which schedule contains which data is essential for using the DOL's bulk datasets, since each schedule is a separate downloadable CSV file.
Schedule A — Insurance Information. Required for plans that use an insurance company to fund benefits or hold plan assets. Discloses insurance contracts, premiums paid, and commissions paid to insurance agents or brokers. For health and welfare plans, Schedule A captures the insurance carrier, the total premiums, and whether the carrier retained any charges as profit. For defined-contribution plans using group annuity contracts, Schedule A reports the annuity premiums and the separate account balances.
Schedule C — Service Provider Information. The most forensically useful schedule for 401(k) fee analysis. Required for large plans, Schedule C must list every service provider that received $5,000 or more in direct or indirect compensation from the plan during the year. This includes recordkeepers, investment managers, third-party administrators, consultants, brokers, and attorneys. For each provider, the schedule reports the provider's name, EIN, the services rendered, and the compensation paid — broken out between direct compensation (paid from plan assets) and indirect compensation (paid by third parties such as mutual fund companies via revenue sharing). Schedule C is the primary source for identifying excessive fee arrangements, revenue-sharing conflicts of interest, and undisclosed compensation structures.
Schedule H — Financial Information (Large Plans). The full income statement and balance sheet for plans with 100 or more participants. Schedule H captures total plan assets at beginning and end of year, contributions, benefit payments, investment gains and losses, administrative expenses, transfers, and ending net assets. It also includes the plan's investment allocation by asset class and a series of questions about plan operations, including whether the plan engaged in any prohibited transactions. Schedule H is the primary source for computing investment returns, expense ratios, and asset allocation trends.
Schedule MB — Actuarial Information for Multiemployer Plans. Required for multiemployer defined-benefit plans. Reports the actuarial present value of accumulated plan benefits, the fair value of plan assets, the funding ratio, the actuarial assumptions used (discount rate, salary growth, mortality table), and whether the plan is in “endangered” or “critical” status under the Pension Protection Act. Plans in critical status are commonly called “red zone” plans; they must implement a rehabilitation plan and face restrictions on benefit increases and lump-sum distributions.
Schedule SB — Actuarial Information for Single-Employer Plans. The equivalent of Schedule MB for single-employer defined-benefit plans. Schedule SB captures the actuarial value of plan assets, the funding target (present value of all accrued benefit liabilities), the funding ratio (assets divided by funding target), the plan's adjusted funding target attainment percentage (AFTAP), the actuarial assumptions, and the minimum required contribution. A plan with an AFTAP below 60 percent is considered “at-risk” and faces severe restrictions on lump-sum distributions and benefit improvements. Schedule SB is the primary source for identifying underfunded pension plans before they reach crisis — or before their sponsors file for bankruptcy.
Accessing the data
The DOL provides Form 5500 data through two main channels. For individual plan lookups, the EFAST2 public disclosure portal at efast.dol.gov allows searching by plan name, EIN, or plan number. The portal returns the PDF filing and, for recent years, structured XML data. This is adequate for looking up a specific plan but not for systematic research.
For bulk analysis, the DOL's Employee Benefits Security Administration (EBSA) publishes the Form 5500 Annual Datasets page at dol.gov/agencies/ebsa/researchers/analysis/form-5500-datasets. Each plan year is available as a set of compressed CSV files, one per schedule: the main F_5500 filing table, plus separate files for F_SCH_A, F_SCH_C, F_SCH_H, F_SCH_MB, F_SCH_SB, and others. The files are hosted at askebsa.dol.gov and can be downloaded directly. Data is available from plan year 2009 onward, with the most recent year typically published six to nine months after the plan year ends (most plans file on a calendar-year basis with a July 31 deadline, though extensions are common).
The EFAST2 system also exposes a public API for querying individual filings by EIN, plan number, or plan year. The API returns structured JSON and is documented at the EFAST2 developer portal. It is suitable for targeted lookups but rate-limited for bulk extraction — the annual CSV datasets are the correct tool for population-level analysis.
Scale of the dataset
In a typical filing year, approximately 750,000 Form 5500 and Form 5500-SF filings are submitted. The breakdown reflects the structure of the American retirement system: the large majority are defined-contribution plans, primarily 401(k) plans, which have largely displaced traditional pensions in the private sector since the 1980s. Defined-benefit plans still account for a significant share of assets despite covering far fewer plans, because they remain common in large corporations, unionized industries, and government-adjacent sectors.
Total plan assets across all filings consistently exceed $10 trillion in recent years, with defined-contribution plans holding the majority. The 401(k) system alone holds approximately $7 to $8 trillion depending on the year and market conditions. These figures make the Form 5500 dataset one of the most comprehensive views available of American household wealth held outside of direct individual ownership, surpassing what can be inferred from IRS tax data or the Federal Reserve's Flow of Funds accounts in plan-level granularity.
What Schedule C reveals about 401(k) fees
The 401(k) fee disclosure enabled by Schedule C is one of the most practically significant uses of Form 5500 data. Before the DOL strengthened fee disclosure rules in 2012, many plan sponsors — particularly smaller employers — had little visibility into the indirect compensation their recordkeepers and investment managers were receiving via revenue sharing from mutual fund companies. A fund company might pay a recordkeeper 25 or 50 basis points annually on assets held in the fund, a payment that effectively came from participants' accounts but was not itemized anywhere visible to the plan sponsor or participants.
Schedule C makes these arrangements visible. For each service provider receiving $5,000 or more, the schedule captures both direct compensation (paid from plan assets explicitly) and indirect compensation (paid by third parties). Comparing Schedule C across plans of similar size and asset level reveals whether a plan's fee structure is competitive or excessive. Plans with high indirect compensation and low direct compensation are often using bundled arrangements where the recordkeeper selects the investment menu from funds that pay the highest revenue sharing — an arrangement that is legal but creates conflicts of interest that ERISA fiduciaries are required to manage.
ERISA class-action litigation has used Schedule C data extensively. When attorneys allege that a plan sponsor selected retail-class mutual funds when cheaper institutional-class shares were available, or that the plan's recordkeeper was receiving excessive compensation, Schedule C provides the documentary foundation. The litigation against university 403(b) plans that proliferated in the mid-2010s relied heavily on Schedule C comparisons showing that plans at peer institutions paid dramatically lower fees for equivalent services.
What Schedule SB reveals about pension funding
For defined-benefit pension analysis, Schedule SB is the primary instrument. The schedule's funding ratio — actuarial value of assets divided by funding target — provides the headline metric of plan health, but the details are equally important.
The discount rate used to compute the funding target is regulated by the IRS but involves a range of permissible assumptions. A plan using a higher discount rate will show a lower present value of liabilities and therefore a healthier funding ratio than the same plan using a lower rate. Comparing plans requires normalizing for actuarial assumptions, which Schedule SB makes possible because it requires disclosure of the specific rates used.
The adjusted funding target attainment percentage (AFTAP) is the metric with immediate legal consequences. Plans with an AFTAP below 80 percent face restrictions on lump-sum distributions and plan amendments that increase benefits. Plans below 60 percent face severe restrictions including a ban on most lump-sum distributions. Plan sponsors must notify participants of their AFTAP, and the Schedule SB disclosure makes the underlying calculation available for verification.
The minimum required contribution reported on Schedule SB determines the amount the plan sponsor must contribute to avoid an excise tax. A company that is failing to make its minimum required contribution is both accumulating an excise tax liability and allowing its pension to fall further behind on funding. Tracking this field over multiple years for a plan whose sponsor is in financial difficulty provides early warning of a potential plan termination and PBGC claim.
Python: downloading Form 5500 data and computing 401(k) expense ratios
The following script downloads the Form 5500 annual dataset for a given year, filters to plans with a 401(k) feature, and computes median and mean expense ratios (total operating expenses divided by total plan assets) bucketed by participant count. The DOL hosts the files at askebsa.dol.gov as zip-compressed CSVs. No API key is required.
import requests, zipfile, io, csv, statistics, collections
# DOL publishes annual Form 5500 datasets as zipped CSVs.
# Dataset page: https://www.dol.gov/agencies/ebsa/researchers/analysis/form-5500-datasets
# Replace the year in the URL to fetch prior years (data available from 2009 onward).
YEAR = 2023
BASE_URL = "https://askebsa.dol.gov/FOIA%20Files/" + str(YEAR) + "/"
# The main filing table (F_5500) contains one row per plan filing.
# The schedule H table (F_SCH_H) contains financial information for large plans.
# The schedule C table (F_SCH_C) contains service-provider compensation.
def fetch_table(filename):
url = BASE_URL + filename
r = requests.get(url, timeout=120)
r.raise_for_status()
z = zipfile.ZipFile(io.BytesIO(r.content))
inner = z.namelist()[0]
with z.open(inner) as f:
reader = csv.DictReader(io.TextIOWrapper(f, encoding="latin-1"))
return list(reader)
print("Downloading F_5500 main filing table...")
filings = fetch_table("F_5500_" + str(YEAR) + ".zip")
print("Rows:", len(filings))
# Filter to 401(k) plans (type_plan_entity_cd == 1 is single-employer;
# type_pension_bnft_code contains '2C' for 401(k) feature).
k401 = [
row for row in filings
if "2C" in (row.get("type_pension_bnft_code") or "")
]
print("401(k) plans:", len(k401))
# Bucket plans by participant count and compute average expense ratio
# using total plan expenses divided by total plan assets (both from Schedule H).
# For this example we use the F_5500 columns directly:
# tot_plan_asset_eoy_amt = total plan assets end of year
# tot_distrib_bnft_amt = total distributions/benefits paid
# contrib_inc_ben_tot_amt = total contributions received
def safe_float(val):
try:
return float(val)
except (TypeError, ValueError):
return None
buckets = collections.defaultdict(list)
for row in k401:
parts = safe_float(row.get("tot_partcp_eoy_cnt"))
assets = safe_float(row.get("tot_plan_asset_eoy_amt"))
expenses = safe_float(row.get("tot_oper_xpns_amt"))
if parts is None or assets is None or expenses is None:
continue
if assets <= 0:
continue
ratio = expenses / assets # approximate expense ratio
if parts < 100:
bucket = "small (<100)"
elif parts < 1000:
bucket = "mid (100-999)"
elif parts < 10000:
bucket = "large (1000-9999)"
else:
bucket = "mega (10000+)"
buckets[bucket].append(ratio)
print("\nAverage expense ratio by plan size (401k plans, " + str(YEAR) + "):")
for label in ["small (<100)", "mid (100-999)", "large (1000-9999)", "mega (10000+)"]:
vals = buckets[label]
if vals:
median = statistics.median(vals)
mean = statistics.mean(vals)
print(" " + label + ": median=" + "{:.4f}".format(median) +
" mean=" + "{:.4f}".format(mean) +
" n=" + str(len(vals)))
The results consistently show that larger plans have dramatically lower expense ratios. Mega plans with 10,000 or more participants routinely achieve expense ratios below 20 basis points, while small plans under 100 participants frequently show ratios exceeding 100 basis points. This disparity reflects economies of scale in recordkeeping and the negotiating leverage large plans possess when selecting investment vehicles — institutional-class fund shares versus retail shares with embedded distribution fees.
How journalists and attorneys use Form 5500
Investigative journalists covering retirement security have used Form 5500 data to document the decline of defined-benefit pensions in specific industries, identify the employers with the most severely underfunded pension obligations, track the growth of 401(k) assets relative to traditional pensions over time, and quantify how much of the defined-benefit system has been transferred to the PBGC. Because the data is longitudinal and covers the full population of large plans, it supports before-and-after comparisons that would be impossible using survey data or company disclosures alone.
ERISA plaintiff attorneys use Form 5500 filings in several distinct ways. In excessive-fee cases, attorneys compare a target plan's Schedule C compensation disclosures against those of peer plans of similar size and asset level to establish that the fees paid were above-market. In imprudent-investment cases, Schedule H investment return data allows comparison of the plan's net-of-fee returns against appropriate benchmarks. In cases involving self-dealing by plan fiduciaries, the service-provider identification on Schedule C and the related-party transaction questions on the main Form 5500 can reveal that the plan is paying excessive fees to entities connected to the plan's fiduciaries. The DOL's own Employee Benefits Security Administration uses Form 5500 data to target its audit and enforcement activities, flagging plans with unusual financial characteristics for follow-up examination.
Corporate finance analysts use Schedule SB data to verify and supplement the pension disclosures in annual reports and 10-K filings. Public companies must disclose their pension obligations under ASC 715 (formerly SFAS 87/132), but the actuarial assumptions embedded in those disclosures are sometimes difficult to compare across companies because of differences in presentation. Schedule SB provides a standardized format for the same underlying data, submitted to a government agency rather than chosen by the company's own actuaries and auditors for the most favorable presentation in the financial statements.
The combination of Form 5500 data with PBGC termination records and SEC pension footnotes creates a comprehensive picture of the defined-benefit pension system that no single source provides alone. Form 5500 shows what the plan looked like in its final years before a potential termination; PBGC records show the outcome and the shortfall; SEC filings show what the company was telling shareholders about its pension obligations in the same period. Cross-referencing these three sources is standard practice in ERISA litigation involving large corporate pension terminations.
Limitations and practical considerations
Form 5500 data has several limitations that analysts must account for. The filing deadline is seven months after the plan year end (with extensions available up to nine and a half months), so the most recent data in the annual datasets may be one to two years behind current conditions for plans that file on extension. The data reflects what the plan sponsor reported, not what an independent auditor verified for plans below the large-plan threshold. The actuarial assumptions in Schedule SB are disclosed but not standardized — comparing funding ratios across plans requires normalizing for assumption differences. And the coding of plan types in the filing has changed over the years in ways that complicate longitudinal analysis without careful attention to the DOL's data dictionaries and technical appendices, which are published alongside each year's dataset.
Despite these limitations, Form 5500 remains the most comprehensive public source of information on private-sector employee benefit plans in the United States. No other dataset covers the full population of plans at this level of financial detail, updated annually, with records going back decades. For anyone investigating retirement security, fiduciary conduct, or the trajectory of employer-sponsored benefits in the American economy, it is the starting point.
Related writing: The graveyard of pensions: using PBGC data to track terminated defined-benefit plans — PBGC trusteed terminations since 1975, funding shortfalls, and the maximum guarantee.
Related writing: DOL Wage and Hour Division: the public enforcement database behind every FLSA violation — how the DOL's enforcement data reveals which employers underpay their workers.
Related writing: SEC Form 13F: mapping $10 trillion in institutional equity holdings — quarterly institutional ownership disclosures and how to query them at scale.