Technical writing

The US Power Grid in Data: Joining EIA Plants, Ownership, Generation, and FERC

· 12 min read· AI Analytics
EnergyEIAFERCElectricityData Engineering

The American electric grid is, from the right distance, four federal datasets stacked on top of one another. The Energy Information Administration keeps a census of every utility-scale power plant and generator—its capacity, its fuel, its location—and a companion schedule of who owns each one; it adds the electricity data on what those generators produce and what the power sells for; and the Federal Energy Regulatory Commission keeps the enforcement record of who has manipulated the wholesale markets the power flows through. Joined on the plant and the operator, these four sources turn the abstraction “the grid” into a queryable map—the physical machine, its owners, its output, and its oversight, all in one view.

This article covers the four federal energy datasets and what each contributes; the EIA as the statistical census-taker and what Form 860 and Form 923 actually record; the ownership schedule and the difference between an owner and an operator; the electricity generation and price data and the questions it answers; FERC as the market regulator and the anti-manipulation authority Congress gave it in the Energy Policy Act of 2005; the join keys—the plant code that ties the EIA tables together and the operator name that bridges to FERC—and the company-name normalization that the bridge requires; the policy questions the assembled data answers, from the fuel mix and coal's retirement to ownership concentration; a Python workflow that pulls a utility's plants and generators, aggregates capacity by fuel, and flags any FERC enforcement against it; and the caveats—name-matching ambiguity, the parent-subsidiary problem, reporting lag, and the limits of utility-scale coverage—that every analyst must internalize first.

The four datasets and what each one is

The map is assembled from four federal energy datasets, three of them from the EIA and one from FERC. The first is the plant and generator inventory—EIA Form 860—which catalogs every utility-scale generating unit in the country: its capacity, fuel, technology, location, and operating status. The second is the ownership schedule, a companion to Form 860, which records who owns and who operates each plant. The third is the electricity data—Form 923 and the EIA's market series—which adds what the generators actually produced, the fuel they consumed, and what electricity sold for. The fourth is the FERC enforcement record, which polices the wholesale markets and brings actions for market manipulation. The first three describe the physical and financial picture; the fourth is the conduct layer that sits over it.

In our database these are four tables—eia_plants, eia_owners, eia_electricity, and ferc_enforcement—and the entire value of the exercise is in joining them. None of the four is especially illuminating alone: the plant inventory is a list of machines without output or ownership; the ownership schedule is a list of corporate names without the assets they attach to; the generation data is a stream of megawatt-hours without the physical context that explains them; and the FERC record is a docket of company names without the fleets behind them. Keyed together—the plant code tying the EIA tables, the operator name bridging to FERC—they become a single, navigable account of the grid that no one of them could provide. The work, as the rest of this article makes clear, is almost entirely in the joining: aligning the plant codes across the EIA schedules and normalizing operator names across the boundary to FERC.

The EIA: the census-taker of American energy

The Energy Information Administration (EIA) is the Department of Energy's independent statistical agency. Its independence is statutory and deliberate: it collects, analyzes, and disseminates energy information without the data being shaped by the policy agenda of the rest of the Department, so that its numbers can serve as a neutral, authoritative reference for Congress, regulators, industry, and the public alike. In the electric-power sector the EIA is, in effect, the census-taker—it runs the surveys that account for the nation's generating fleet and its output, and the public data it publishes is the closest thing the United States has to a complete, official inventory of how its electricity is made.

Form EIA-860 is the inventory survey. It is the annual census of generators—collecting, for every utility-scale generating unit in the country, a structured description of the machine and the plant that houses it. For each generator the form records its nameplate capacity (its rated maximum output in megawatts), its energy source(the fuel or resource it burns or harnesses—coal, natural gas, nuclear, wind, solar, water, and so on), its prime mover (the technology that turns energy into electricity—steam turbine, combustion turbine, combined cycle, photovoltaic, wind turbine, hydro turbine, battery), its operating status (operating, standby, retired, planned), and its key dates. The companion plant-level schedules record the facility's name, its location down to latitude and longitude, its balancing authority, and the utility responsible for it. This is the structural backbone of the whole map: it is Form 860 that tells you what physically exists and where.

Form EIA-923 is the operations survey—the companion that records what the inventory actually did. Where Form 860 captures capacity, Form 923 captures generation (the megawatt-hours each plant produced), fuel consumption and quality (how much fuel of what heat content was burned, and at what cost), and the resulting fuel-and-emissions picture. Alongside Form 923, the EIA publishes a family of electricity series—state and sector sales, average retail prices, and wholesale data—that complete the financial half of the picture. The division of labor is clean and worth holding onto: Form 860 is the what exists, Form 923 is the what it produced, and the price series is the what it sold for.

The plant and generator inventory

The inventory is hierarchical, and getting the hierarchy right is the first thing an analyst must do. The unit of physical existence is the plant: a single facility at a single location, identified by an EIA plant code. Within a plant there are one or more generators: individual generating units, each with its own identifier within the plant, its own capacity, its own fuel, and its own status. A large gas plant might contain several combustion turbines and a steam unit; a wind farm is a single plant whose “generator” rows aggregate its turbines. The plant carries the geography and the operator; the generator carries the capacity-and-fuel detail. The columns that matter most, across the plant and generator schedules, are these:

-- plant-level (eia_plants):
plant_code            -- EIA's persistent plant identifier (the join key)
plant_name            -- the facility name
utility_id            -- the operating utility's EIA ID
utility_name          -- the operator of record
state / county        -- location of the facility
latitude / longitude  -- point coordinates of the plant
balancing_authority   -- the grid operator the plant reports to
-- generator-level (joined on plant_code):
generator_id          -- unit identifier within the plant
nameplate_capacity_mw -- rated maximum output in megawatts
energy_source_1       -- primary fuel / resource (coal, NG, nuclear, wind...)
prime_mover           -- ST, CT, CA, PV, WT, HY, BA -- the generating tech
status                -- operating, standby, retired, planned
operating_year        -- year the unit entered service
planned_retirement    -- announced retirement year, where reported

The plant_code is the load-bearing column—the persistent identifier that ties every EIA record about a facility together. It is the key that joins the generator schedule to the plant schedule, the ownership schedule to both, and the Form 923 generation to all three. Because the plant code is stable over time, it is also what lets an analyst follow a facility across years: track a coal plant's units retiring one by one, watch a gas plant add a combined-cycle unit, or see a utility-scale solar plant first appear as “planned” and then flip to “operating.” The energy_source_1 and prime_mover columns together encode the fuel-and-technology profile that drives almost every aggregate question about the grid, and the status and planned_retirement columns are what make the inventory a moving picture of the energy transition rather than a static snapshot.

The ownership schedule: owners versus operators

The single most important conceptual distinction in this data—and the one most easily missed—is between the operator and the owner of a plant. The Form 860 plant schedule names the operating utility: the entity that runs the facility day to day and reports it to the EIA. But the operator is not necessarily the owner, and a single plant is frequently owned by several parties in defined fractional shares. A merchant power plant might be operated by one company while owned jointly by a utility, an independent power producer, and a financial investor; a nuclear plant is commonly co-owned by a consortium of utilities. The ownership schedule—a separate Form 860 schedule—is what records these shares: for each owned generator it lists each owner, the owner's identity, and the percentage of the generator the owner holds.

This distinction has real analytical consequences. If you want to know who operatesthe most capacity, you aggregate the plant-schedule operating utility; if you want to know who owns the most capacity, you must go to the ownership schedule and weight each generator's capacity by each owner's percentage share, because attributing a jointly owned plant's full capacity to its operator—or to any single owner— double-counts or mis-assigns it. The ownership schedule keys back to the same plant_code and generator identifier, so it joins cleanly onto the inventory, but it changes the arithmetic: a fleet rolled up by ownership share is a different—and for questions of market concentration, the correct—view from a fleet rolled up by operator. The schedule is also the layer at which the corporate family becomes visible: it is here that an ostensibly independent plant turns out to be majority-held by a large utility holding company, which is exactly the relationship that matters when the analysis turns to who really controls the grid.

The electricity data: generation and prices

Capacity and ownership describe the machine and who holds it; the electricity data describes what the machine does and what its output is worth. Two strands matter. The first is generation—the actual electricity produced, from Form 923—which transforms the inventory from a catalog of potential into a record of performance. A plant's nameplate capacity tells you how much it could produce; its generation tells you how much it did, and the ratio between the two—the capacity factor—is one of the most revealing numbers in the whole dataset. A baseload nuclear or efficient combined-cycle gas plant runs near its capacity most of the year; a peaking combustion turbine or an older coal unit runs only when prices are high; a solar plant produces only when the sun shines. Joining generation to capacity by plant code is what lets an analyst ask not just what exists but what runs hardest, and which units have effectively become standby reserves long before they are formally retired.

The second strand is prices and sales. The EIA publishes electricity sales and average retail prices by state and by customer sector—residential, commercial, industrial—along with wholesale and fuel-cost data. These series put a value on the output and a cost on the inputs, and they are what let the physical map connect to the economic questions: how the fuel mix of a region relates to what its customers pay, how a fuel price shock propagates through to wholesale and retail electricity, and how the economics of an aging coal unit compare to the gas and renewables displacing it. The price data is also where the EIA picture begins to touch FERC's world, because the wholesale prices that FERC's markets set are precisely the prices that market manipulation distorts—which is the bridge to the fourth dataset.

FERC: the regulator over the wholesale market

The Federal Energy Regulatory Commission (FERC) is the independent agency that regulates the interstate transmission of electricity and the wholesale sale of electric power. Where the EIA counts the grid, FERC governs a critical part of it: it oversees the wholesale electricity markets—the markets in which generators sell power to utilities and load-serving entities—and the interstate high-voltage transmission system over which that power moves. Its jurisdiction is the seam between the physical generating fleet that the EIA inventories and the prices at which that fleet's output changes hands. FERC's role is the conduct layer over the EIA's physical and financial picture: the same plants and the same prices, viewed through the lens of whether the market that set those prices was operated fairly.

FERC's modern enforcement teeth come from the Energy Policy Act of 2005. In the wake of the Western energy crisis of the early 2000s—in which trading strategies exploited the design of California's newly restructured electricity market to drive prices to extraordinary levels—Congress gave FERC explicit anti-manipulation authority: the power to prohibit, and to penalize, the use of manipulative or deceptive devices in connection with the purchase or sale of wholesale electricity and transmission service. This authority, modeled on the securities laws' anti-fraud provisions, transformed FERC from a rate-setting body into a market cop with the ability to investigate, to levy substantial civil penalties, and to order the disgorgement of unjust profits. FERC enforcement actions—the dataset's fourth table—are the public record of how that authority has been used: the companies investigated, the conduct alleged, and the penalties imposed for manipulating the very markets the EIA's price data measures.

The conduct FERC polices ranges from sophisticated trading schemes—placing physical trades at a loss to move an index that profits a larger financial position, or scheduling power in patterns designed to capture congestion payments without serving any real demand—to the withholding of generation to raise prices. What unites the cases is that they distort the wholesale price formation the EIA records as ordinary market data. This is why the FERC enforcement record is the natural fourth layer: it is the dataset that answers, for the biggest players in the EIA inventory, whether the markets their output flows through have been manipulated, and by whom.

The join keys: plant code and operator name

Assembling the four datasets into one view rests on two join keys, and they are of very different quality. The first—internal to the EIA—is the plant code, and it is excellent. Because the EIA assigns a persistent plant code to every facility and carries it on the plant schedule, the generator schedule, the ownership schedule, and the Form 923 generation, the three EIA tables join to one another cleanly and unambiguously. A generator joins to its plant on the plant code; an ownership row joins to its generator on the plant code and generator identifier; a generation record joins to its plant on the plant code. Within the EIA universe the data is genuinely relational, and the joins are exact.

The second join key—the bridge from the EIA to FERC—is the company name, and it is the hard part. FERC enforcement actions name a company: the respondent in the docket. The EIA identifies the operator and the owners of a plant, also by company name (and by EIA utility ID). There is no shared numeric key linking a FERC respondent to an EIA utility ID, so the only way to connect a FERC case to a fleet is to match the company names across the two sources—and company names are messy. The same firm appears as “Duke Energy,” “Duke Energy Carolinas, LLC,” and “Duke Energy Corporation” in different records; subsidiaries carry names that differ from their parents; legal suffixes (LLC, Inc., L.P., Corp.) vary; and trading affiliates often bear names unlike the operating utilities they belong to. The bridge is therefore company-name normalization—stripping suffixes, standardizing punctuation and spacing, and resolving aliases—and it is reliable enough to generate candidates but never to be trusted blindly. A name match across the EIA/FERC boundary is a record to confirm, not a conclusion to assert.

The deeper complication beneath the name problem is the corporate hierarchy. The entity FERC names in an enforcement action—often a trading or marketing affiliate—may not be the entity the EIA lists as the operator of a plant, even though both belong to the same parent holding company. Rolling a fleet up to its ultimate parent, and then asking whether any entity in that corporate family has a FERC enforcement history, requires a layer of corporate-family resolution that neither dataset provides on its own. The ownership schedule helps—it exposes the parent-subsidiary ownership relationships within the EIA data—but the final bridge to FERC respondents is a name-and-hierarchy matching exercise, and it is where most of the real engineering effort in this analysis lives.

Questions the assembled data answers

Assembled, the four datasets answer the questions that drive energy policy—questions that no single source can address. The most prominent is the fuel mix and the energy transition. By aggregating generator capacity and generation by energy source, and tracking it across years through the stable plant codes, the data shows how fast coal is retiring and how quickly natural gas, wind, solar, and battery storage are being added. The status and planned_retirement columns make the trajectory legible: coal units flipping from operating to retired, gas combined-cycle and renewable capacity coming online, storage appearing as an entirely new prime-mover category. This is the single most-watched story in the American power sector, and the EIA inventory is the authoritative way to measure it.

The second is ownership concentration. Using the ownership schedule to roll capacity up by owner—and then up again to the ultimate corporate parent—reveals how the grid's assets concentrate: how much capacity sits with large regulated utility holding companies, how much with merchant independent power producers, and how much with financial owners. Because the ownership schedule carries fractional shares, this view can be done correctly rather than by the crude expedient of attributing each plant to its operator. The third question is which plants run hardest: joining Form 923 generation to nameplate capacity yields capacity factors that separate the baseload workhorses from the peakers and the effectively idle, and reveals where the real energy is coming from as opposed to merely where the capacity sits.

The fourth, and the one that requires all four datasets, is how market oversight maps onto the biggest players. Having rolled the fleet up to its parents and attached each parent's generation and fuel profile, an analyst can overlay the FERC enforcement record—flagging which of the largest owners and operators have market-manipulation histories. The result is a view in which the physical scale of a company (its capacity and generation), its market position (its share of a region's output), and its conduct record (its FERC enforcement exposure) sit side by side. That is the payoff of the join: a single picture in which the grid's physical reality, its financial output, its ownership, and its regulatory conduct can be reasoned about together.

Python workflow: a utility's fleet, its fuel mix, and FERC flags

The script below works the core of the problem end to end. It pulls the EIA Form 860 bulk archive—which bundles the plant, generator, and ownership schedules in one downloadable ZIP—resolves the schedule files and their column names defensively (the EIA's year-stamped file names and two-row headers shift between releases), filters the plants to a single operating utility, joins the generators on the plant code, and aggregates nameplate capacity by fuel to produce the operator's fuel mix. It then normalizes the operator's name and screens it against a list of FERC enforcement respondents, flagging any candidate match. The EIA bulk files and the EIA API are public; an EIA API key is free and the demo key works for light use. Requirements: requests and pandas (plus an Excel engine such as openpyxl).

import requests, io, zipfile
import pandas as pd
from collections import defaultdict

# Mapping a utility’s fleet from federal energy data.
#
# Two public, key-free sources are used together:
#   1. EIA Form 860 / Form 923 bulk files (Excel/ZIP) -- the plant and
#      generator inventory, the ownership schedule, and generation.
#   2. The EIA API (an api_key is free; "DEMO_KEY" works for light use)
#      for the electricity series.
# FERC enforcement actions are a separate public list keyed by company
# NAME, not by EIA plant code, so the bridge is name normalization.
#
# The Form 860 bulk archive bundles several schedules in one ZIP; the
# exact file names change by data year, so resolve them at runtime
# rather than hard-coding the year-stamped names.
EIA860_ZIP = "https://www.eia.gov/electricity/data/eia860/xls/eia8602022.zip"


def _load_860(url=EIA860_ZIP):
    r = requests.get(url, timeout=300)
    r.raise_for_status()
    zf = zipfile.ZipFile(io.BytesIO(r.content))
    names = zf.namelist()

    def pick(*needles):
        for n in names:
            low = n.lower()
            if all(k in low for k in needles):
                return n
        return None

    # Schedule 2 = plants; Schedule 3 = generators; ownership schedule.
    plant_f = pick("2___plant")
    gen_f   = pick("3_1_generator")
    own_f   = pick("4___owner")
    # The EIA sheets carry two header rows; row 1 holds the real names.
    plants = pd.read_excel(zf.open(plant_f), header=1)
    gens   = pd.read_excel(zf.open(gen_f),   header=1)
    owners = pd.read_excel(zf.open(own_f),   header=1)
    return plants, gens, owners


def _col(frame, *needles):
    # Find the first column whose lowercased name contains all needles.
    for c in frame.columns:
        low = str(c).lower()
        if all(k in low for k in needles):
            return c
    return None


def norm_name(name):
    # Crude company-name normalization to bridge EIA operators to FERC
    # respondents. A FIRST-PASS join only -- a match is a record to
    # confirm by hand, not a determination (see the caveats).
    s = str(name or "").upper()
    for junk in (",", ".", " LLC", " L L C", " INC", " CORP", " CO",
                 " COMPANY", " LP", " L P", " LTD", " HOLDINGS"):
        s = s.replace(junk, " ")
    return " ".join(s.split())


def fleet_for(operator, ferc_names):
    plants, gens, owners = _load_860()
    pcode  = _col(plants, "plant", "code")
    pname  = _col(plants, "plant", "name")
    util   = _col(plants, "utility", "name")
    g_pcode = _col(gens, "plant", "code")
    g_cap   = _col(gens, "nameplate", "capacity")
    g_fuel  = _col(gens, "energy", "source", "1") or _col(gens, "fuel")

    mine = plants[plants[util].map(norm_name) == norm_name(operator)]
    codes = set(mine[pcode])
    fleet = gens[gens[g_pcode].isin(codes)].copy()

    # --- 1. Capacity by fuel/energy source ------------------------------
    by_fuel = (fleet.groupby(g_fuel)[g_cap].sum()
                    .sort_values(ascending=False))
    total = by_fuel.sum()
    print(f"{operator}: {len(codes)} plants, "
          f"{total:,.0f} MW nameplate across {len(by_fuel)} fuels")
    for fuel, mw in by_fuel.head(8).items():
        print(f"  {str(fuel):<8} {mw:>10,.0f} MW  ({mw / total:.1%})")

    # --- 2. Flag any FERC enforcement against the operator --------------
    n = norm_name(operator)
    hits = [r for r in ferc_names if n and n in norm_name(r)]
    if hits:
        print(f"  FERC ENFORCEMENT CANDIDATE(S): {hits[:3]}")
    else:
        print("  No FERC enforcement candidate on the supplied list.")
    return by_fuel


# FERC publishes enforcement actions (market-manipulation cases under the
# EPAct 2005 anti-manipulation authority) as a public list; load the
# respondent names there. Placeholder names shown for illustration.
ferc_respondents = ["Example Energy Trading LLC", "Acme Power Marketing Inc"]

fleet_for("Duke Energy Carolinas, LLC", ferc_respondents)

Two refinements turn this first pass into rigorous analysis. First, the capacity aggregation in the script attributes each generator's full nameplate capacity to its operating utility, which is the operator view; for an ownership view it must instead join the ownership schedule and weight each generator's capacity by each owner's percentage share, then roll the owners up to their ultimate parents—the only correct basis for a concentration analysis. Second, the FERC screen here is a deliberately crude substring match on normalized names; a serious version must resolve the corporate hierarchy so that a FERC action against a trading affiliate is correctly attributed to the parent whose generating fleet appears in the EIA data, and every candidate match must be confirmed against the underlying records rather than accepted on the strength of the name alone. For generation and price metrics, the EIA API's electricity series and the Form 923 bulk files supply the megawatt-hours and dollars that turn the capacity-by-fuel picture into a generation-and-economics one.

Limitations and analytical caveats

The four datasets together are the most complete public account of the American grid, but the join that makes them powerful also introduces failure modes an analyst must internalize before drawing conclusions.

The EIA-to-FERC bridge is name matching, with all that implies. There is no shared key linking a FERC respondent to an EIA utility ID, so the connection rests on matching company names across the boundary—and company names collide, vary, and mislead. A normalized-name match is a candidate to investigate, never a determination that a particular fleet has a FERC enforcement history. Both errors are real: a false positive wrongly tars a utility with another firm's conduct, and a false negative misses a genuine match hidden behind an affiliate's unfamiliar name. The match quality is only as good as the normalization and the entity resolution applied to it.

The parent-subsidiary problem is pervasive. The entity the EIA lists as a plant's operator, the entities the ownership schedule lists as its owners, and the entity FERC names in an enforcement action are frequently different members of the same corporate family. Attributing capacity, generation, or conduct correctly requires rolling every entity up to its ultimate parent—a corporate-hierarchy resolution that neither dataset supplies. Analysis that stops at the operating-utility or named-respondent level will both under-count the largest players' true footprints and mis-locate their enforcement exposure.

There is reporting lag and a release cadence. Form 860 and Form 923 are annual surveys, published with a lag after the data year closes, and the bulk files are revised. The most recent year is therefore always incomplete or preliminary, and a snapshot taken today describes a grid that is some months behind the physical reality—a coal unit that retired this year may still read as operating, a planned solar plant may not yet appear. FERC enforcement actions, similarly, appear only after the agency acts, well after the conduct. This data is authoritative for established structure and multi-year trends; it is not a real-time monitor of the grid.

Coverage is utility-scale, and an enforcement action is not a verdict on a fleet. Form 860 inventories utility-scale generators above the survey threshold; the vast and fast-growing population of small distributed resources—rooftop solar, behind-the-meter storage—is captured only partially or in aggregate, so the inventory understates distributed capacity. And a FERC enforcement record attaches to a company's conduct in a market, not to the safety, reliability, or value of the plants it owns; flagging an operator's FERC history is a signal about market behavior, not a judgment on its generating fleet. Held with these caveats—name-matching ambiguity, corporate hierarchy, reporting lag, and the scope of coverage—the joined EIA and FERC datasets are a uniquely powerful map: the physical grid, its owners, its output, and its oversight, assembled from public federal data into a single view of how the country makes, sells, and polices its electricity.

Related writing

EIA Form 860: The Federal Database Behind Every US Power Plant and Electricity Generator — The foundational inventory layer of this map, taken on its own terms: the plant-and-generator census, its capacity, fuel, prime-mover, and status fields, and the plant code that ties the whole EIA universe together.

EIA Generator Ownership: The Federal Record of Who Owns America's Power Plants — A deep dive on the ownership schedule used here to separate owners from operators, weight capacity by fractional share, and roll fleets up to their corporate parents—the layer that makes ownership-concentration analysis correct rather than approximate.

FERC Enforcement: The Federal Watchdog Over Energy Market Manipulation — The conduct layer in full: the anti-manipulation authority from the Energy Policy Act of 2005, the kinds of schemes FERC pursues, and the enforcement record this article overlays onto the biggest players in the EIA inventory.