Technical writing

Is College Worth It? Measuring Value by Joining IPEDS and the College Scorecard

· 12 min read· AI Analytics
EducationCollege ScorecardIPEDSStudent DebtData Engineering

The question every prospective student and worried parent asks—is this college worth it?—has, for the first time, a data-driven answer that can be given school by school and, increasingly, program by program. The answer lives in two federal datasets that are designed to be read together: one is the mandatory census of what every college charges, enrolls, and graduates; the other is the record, drawn from federal student-aid and tax files, of the debt students take on and the money they later earn. Join them on the institution's identifier and the price of a degree sits in the same row as the payoff.

This article covers what the two datasets are and why they are complementary halves of a single picture; how NCES IPEDS supplies the institutional side—enrollment, tuition and net price, admissions, retention, and completion—while the College Scorecard supplies the outcomes side—median debt, repayment, and post-enrollment earnings; the join key that makes the marriage clean (the IPEDS UnitID and the federal student-aid OPEID, both of which the Scorecard carries) and the Classification of Instructional Programs (CIP) code that extends the analysis to the program level; how value differs across the public, private-nonprofit, and for-profit sectors, and where high graduation rates and high debt diverge; the policy frame from gainful employment to the Scorecard's own publication mandate; the public access paths—the NCES site, the Urban Institute Education Data API, and the api.data.gov-keyed Scorecard API; a Python workflow that pulls net price, completion, median debt, and earnings and computes an earnings-to-debt value ratio; and the caveats—selection effects, suppression, and the limits of a single number—that every honest comparison has to respect.

Two datasets, two halves of one question

The cost-of-college debate has long suffered from a missing denominator. The price of a degree was visible enough—sticker tuition is published, and net price after aid is calculable—but the payoff was anecdotal: a brother-in-law who majored in philosophy and does fine, a cousin with a six-figure loan and a job that does not service it. The two federal datasets at the center of this article close that gap by putting price and payoff in the same frame. NCES IPEDS—the Integrated Postsecondary Education Data System—is the institutional census: every college that participates in federal student aid must report, every year, its enrollment, its tuition and net price, its admissions selectivity, its retention and completion rates, its finances, and its faculty. The College Scorecard, published by the U.S. Department of Education, is the outcomes layer: built largely from the National Student Loan Data System and matched IRS earnings records, it reports the median debt aided students carry, how they repay it, and what they earn years after they enrolled.

They are deliberately complementary. IPEDS is comprehensive on inputs and process—who enrolls, what they pay, how many finish—but it has historically been thin on what happens after graduation, because a college does not know what its alumni earn. The Scorecard fills exactly that void, because the federal government does know: it holds the loan records and can match them, in aggregate and with privacy protections, to tax filings. Neither alone answers whether college is worth it. IPEDS tells you the cost and the odds of finishing; the Scorecard tells you the debt and the earnings. Only joined do they let you divide one by the other. Our work stores them as two tables—the IPEDS institutional-characteristics, enrollment, and completion data as nces_ipeds, and the Scorecard cost, debt, and earnings outcomes as college_scorecard—each keyed so that the join is a clean institutional match rather than a fuzzy entity-resolution problem.

The IPEDS half: what colleges charge, enroll, and graduate

IPEDS is administered by the National Center for Education Statistics (NCES), the statistical arm of the Department of Education, and its defining feature is that reporting is mandatory. Any institution that participates in the federal student financial aid programs authorized under Title IV of the Higher Education Act—essentially every college that accepts federal grants or loans—is required by law to complete the IPEDS surveys. That mandate is what makes IPEDS a true census rather than a sample: it covers the full population of degree- and certificate-granting institutions, from the flagship research university to the for-profit cosmetology school, on the order of six thousand institutions reporting every year.

The surveys are organized into components that collectively describe an institution from every angle. Institutional characteristics carry the directory information and, crucially, the published tuition and fees and the calculated average net price—the price a student actually pays after grant and scholarship aid is subtracted, which is the only price figure that means anything for a value calculation. Fall enrollment and twelve-month enrollment count the student body by level, attendance status, and demographics. Admissions records selectivity—applications, admits, yield, test-score ranges. Graduation rates and outcome measures track the share of students who complete within set windows, and the related retention measures track first-to-second-year persistence. Completions count the degrees and certificates awarded by field of study and credential level, and financeand human resources describe institutional budgets and staffing. For the worth-it question, the load-bearing IPEDS fields are net price (the cost), graduation rate (the odds of getting the thing you paid for), and the institutional context —sector, selectivity, size—that any fair comparison has to hold constant.

The Scorecard half: debt, repayment, and earnings

The College Scorecard exists to make the payoff side of college visible and comparable. Its distinctive value is that it reports outcomes IPEDS cannot, because it is assembled from the federal government's own administrative records rather than from anything a college self-reports about its graduates. The debt and repayment figures come from the federal student-aid system—the loan records of students who borrowed—and the earnings figures come from a privacy-protected match to tax records, reporting the earnings of the cohort of federally aided students some years after they first enrolled.

Three families of outcome carry the weight. The first is median debt: the typical loan balance students take on, reported separately for those who complete and those who do not—a distinction that matters enormously, because a student who borrows and leaves without a credential carries the cost with none of the benefit. The second is repayment: measures of whether borrowers are paying their loans down, in deferment or forbearance, or in default—a signal of whether the debt is sustainable on the earnings the degree actually produces. The third, and the one that transformed the debate, is post-enrollment earnings: the median earnings of aided students measured at fixed horizons after they entered, which puts a dollar figure on the payoff. Because earnings are reported for the population of federally aided students—not a survey, not alumni who chose to respond—they are far harder to game than the placement statistics colleges once advertised. The Scorecard is also published with an explicit comparability mandate: the Department designed it so that the same fields, defined the same way, are available for every institution, which is precisely what lets a value ratio be computed identically across thousands of schools.

The join key: UnitID, OPEID, and the clean match

The reason these two datasets can be married without the misery of fuzzy name-matching is that they share institution identifiers, and the Scorecard deliberately carries both of them. IPEDS assigns every institution a UnitID—a stable numeric key that uniquely identifies a college across all of the IPEDS surveys and across years. The federal student-aid system assigns each institution an OPEID (the Office of Postsecondary Education Identification number), the key under which the school's Title IV participation, loans, and grants are tracked. These two identifiers come from different administrative lineages—one statistical, one financial-aid—and historically that split is what made it hard to put institutional characteristics next to financial outcomes.

The College Scorecard solves it by carrying both. Each Scorecard institution row exposes its UnitID (in the API, the field named id) and its OPEID (the field ophid), so the outcomes layer lines up exactly with the IPEDS institutional layer on the UnitID, and with any pure financial-aid dataset on the OPEID. The practical consequence is that joining nces_ipeds to college_scorecard is a straightforward equi-join on UnitID—no string normalization, no resolving “Univ. of Calif., Berkeley” against “University of California-Berkeley,” no manual adjudication of branch campuses. The one place that demands care is the main-campus versus branch distinction: a single OPEID can subsume multiple physical locations, and IPEDS may carry parent and child UnitIDs, so an analyst aggregating outcomes has to decide deliberately whether to roll up to the OPEID level or keep institutions split at the UnitID level. Get that decision right and the rest of the join is mechanical.

The program-level layer: CIP codes and field of study

The most important recent advance in the worth-it question is that it no longer has to be answered only at the level of the whole institution. A university's institution-wide median earnings figure blends the petroleum engineers with the poets, which tells a prospective nursing student very little. The Scorecard's program-level data disaggregates outcomes by field of study, using the Classification of Instructional Programs (CIP)—the standard federal taxonomy of academic programs—crossed with the credential level (certificate, associate's, bachelor's, graduate). The unit of analysis becomes the program: this CIP code, at this credential level, at this institution.

This is where the data becomes genuinely decision-useful, because it answers the question students actually face—not “is this college worth it” in the abstract but “is this major at this college worth it.” Program-level earnings and debt reveal patterns the institutional averages hide: an elite university with strong overall outcomes can still host a credential whose graduates carry more debt than their earnings will comfortably service, while a modest regional school can host an allied-health or skilled-trade program whose graduates out-earn the alumni of far more prestigious institutions. The program layer is also the level at which federal accountability policy has increasingly operated—the recurring gainful employmentframework, which conditions a program's Title IV eligibility on whether its graduates' debt is sustainable relative to their earnings, is fundamentally a CIP-by-credential calculation. Working at this grain demands care—the CIP-credential cells are small, and small cells are heavily suppressed for privacy—but it is where the value question gets its sharpest answer.

Sector, selectivity, and where value diverges

Once IPEDS and the Scorecard are joined, the single most illuminating dimension to cut by is sector—the IPEDS ownership control that sorts institutions into public, private-nonprofit, and for-profit. The three sectors occupy visibly different positions on the price-debt-earnings plane. Public institutions, subsidized by state appropriations, typically post the lowest net prices and, for in-state students, the most favorable debt-to-earnings positions. Private-nonprofit institutions span an enormous range, from heavily endowed schools whose generous aid drives net price well below sticker to tuition-dependent colleges where the net price stays high. The for-profit sector has historically clustered in the difficult corner of the plane—higher debt and weaker earnings relative to net price—which is precisely why federal accountability efforts have concentrated there.

The richest findings come from the places where the conventional signals diverge. The most common and most consequential divergence is between graduation rate and debt: an institution can graduate a high share of its students and still saddle them with debt their post-enrollment earnings cannot service, because finishing the degree is necessary but not sufficient for value—what matters is whether the credential commands earnings proportionate to its cost. The mirror-image case is just as instructive: a low-net-price institution with a middling graduation rate can still deliver excellent value for the students who do finish, because the debt they carry is small relative to what they go on to earn. Joining the datasets is what surfaces these cases. Looking at completion alone rewards selective schools that admit students likely to succeed anyway; looking at earnings alone rewards schools in high-wage regions or with high-earning-field mixes; only the joined picture—net price and completion from IPEDS, debt and earnings from the Scorecard—separates the institutions that actually convert tuition into earning power from those that merely enroll students who would have done well regardless.

Accessing the data: NCES, Urban Institute, and the Scorecard API

Both datasets are fully public, and there are three practical access paths. IPEDS is available directly from NCES, which publishes the raw survey files and a suite of web tools—the Data Center for custom downloads, College Navigator and the College Scorecard's consumer site for single-institution lookups, and complete survey-year files for bulk work. For programmatic access to IPEDS, the Urban Institute Education Data API is the most convenient route: it wraps the NCES IPEDS surveys in a clean, documented REST interface keyed by UnitID, requires no API key, and exposes the directory, enrollment, finance, and completions endpoints in a consistent JSON shape—sparing the analyst the work of parsing the original fixed-width and dictionary-driven NCES files.

The College Scorecard is distributed two ways. The bulk data files—institution-level and program-level CSVs, with a published data dictionary—are the right choice for national-scale work, because they ship the entire universe in one download with authoritative, version-stamped field definitions. For targeted, on-demand queries the Department offers the College Scorecard API, hosted on api.data.gov, which requires a free api.data.gov key passed as a query parameter and returns the cost, debt, earnings, and completion fields in JSON, filterable and field-selectable so a client can request only the columns it needs. Because the Scorecard API rows carry both id (UnitID) and ophid (OPEID), a workflow can pull outcomes from the Scorecard API and institutional detail from the Urban Institute IPEDS API and join the two on UnitID with no reconciliation step—the pattern the worked example below follows.

Python workflow: an earnings-to-debt value ratio

The script below queries the College Scorecard API for the institutions matching a name, pulls the four fields that anchor a value judgment—average net price, completion rate, median debt of completers, and median earnings ten years after entry—and computes two genuine value metrics: an earnings-to-debt ratio(post-enrollment earnings per dollar of median debt) and an earnings-to-net-price ratio (earnings per dollar of annual net price). Both are higher-is-better. The script selects only the fields it needs, carries the UnitID (id) and OPEID (ophid) so the result can be joined straight to IPEDS, and shows how the same UnitID drives a no-key Urban Institute IPEDS lookup for institutional detail. Replace DEMO_KEY with a free api.data.gov key for anything beyond a quick test.

import requests, pandas as pd

# Two federal higher-education sources, joined on the institution identifier.
#   1. College Scorecard API -- outcomes layer (cost, debt, earnings).
#      Requires a free api.data.gov key passed as ?api_key=...
#   2. Urban Institute Education Data API -- a clean REST wrapper over the
#      NCES IPEDS surveys (no key required), used here for institutional detail.
# The Scorecard carries both UNITID (the IPEDS key) and OPEID, so the two
# halves line up exactly without any fuzzy entity resolution.
SCORECARD = "https://api.data.gov/ed/collegescorecard/v1/schools"
IPEDS = "https://educationdata.urban.org/api/v1"
API_KEY = "DEMO_KEY"  # replace with your own api.data.gov key for real use


def scorecard(name, year="latest"):
    # Pull the outcomes fields for institutions matching a name.
    fields = [
        "id", "ophid",                         # UNITID (id) and OPEID (ophid)
        "school.name", "school.state",
        "school.ownership",                    # 1=public 2=private-NP 3=for-profit
        f"{year}.cost.avg_net_price.overall",  # net price after grant aid
        f"{year}.completion.rate_suppressed.overall",
        f"{year}.aid.median_debt.completers.overall",
        f"{year}.earnings.10_yrs_after_entry.median",
    ]
    params = {
        "school.name": name,
        "fields": ",".join(fields),
        "per_page": 50,
        "api_key": API_KEY,
    }
    r = requests.get(SCORECARD, params=params, timeout=60)
    r.raise_for_status()
    return pd.json_normalize(r.json()["results"])


def value_ratios(name, year="latest"):
    df = scorecard(name, year)
    if df.empty:
        print(f"No Scorecard match for {name!r}.")
        return df

    debt = df[f"{year}.aid.median_debt.completers.overall"].astype(float)
    earn = df[f"{year}.earnings.10_yrs_after_entry.median"].astype(float)
    net = df[f"{year}.cost.avg_net_price.overall"].astype(float)

    # Two genuine value metrics, both higher-is-better:
    #   earnings-to-debt  -- post-enrollment earnings per dollar of median debt
    #   earnings-to-price -- earnings per dollar of annual net price
    df["earn_to_debt"] = (earn / debt).round(2)
    df["earn_to_price"] = (earn / net).round(2)

    out = df[[
        "id", "ophid", "school.name", "school.state",
        f"{year}.cost.avg_net_price.overall",
        f"{year}.aid.median_debt.completers.overall",
        f"{year}.earnings.10_yrs_after_entry.median",
        "earn_to_debt", "earn_to_price",
    ]].sort_values("earn_to_debt", ascending=False)

    for _, row in out.iterrows():
        e10 = row[f"{year}.earnings.10_yrs_after_entry.median"]
        d = row[f"{year}.aid.median_debt.completers.overall"]
        print(f"{row['school.name'][:38]:38}  UNITID={row['id']}  "
              f"earn10=${e10:,.0f}  debt=${d:,.0f}  "
              f"E/D={row['earn_to_debt']}  E/P={row['earn_to_price']}")
    return out


# Worked example: compare the value ratios for a search term.
value_ratios("University of California")
# IPEDS detail for one institution by UNITID (no key needed):
# requests.get(f"{IPEDS}/college-university/ipeds/directory/2021/"
#              "?unitid=110635").json()

Two refinements turn this from a demonstration into analysis. First, the earnings-to-debt ratio computed here uses the median debt of completers; a fuller value picture must weight for the students who borrow and do not finish, whose debt-without-credential is the worst outcome the data records, and must consider the repayment measures alongside the raw debt level, because a manageable balance on solid earnings is categorically different from a smaller balance that nonetheless goes into default. Second, an institution-wide ratio blends every field of study; the decision-relevant version of this calculation runs against the program-level Scorecard files, computing the earnings-to-debt ratio per CIP code and credential level so that the answer is specific to the major a student is actually weighing —subject to the suppression of small program cells discussed next.

Before you load the tables

Below are the columns we keep, the shape both halves take once joined on UnitID. The IPEDS fields supply cost and process; the Scorecard fields supply debt and outcome; the identifier columns make the join exact.

-- join keys (carried by both halves via the Scorecard) ---------------
unitid                      -- IPEDS UnitID: the universal institution key
opeid                       -- Office of Postsecondary Education ID (Title IV)
-- IPEDS institutional side (nces_ipeds) ------------------------------
institution_name            -- institution name
state                       -- state / jurisdiction
sector / control            -- public, private-nonprofit, for-profit
total_enrollment            -- fall enrollment headcount
tuition_fees                -- published in-state tuition and fees
avg_net_price               -- price after grant/scholarship aid (cost)
admission_rate              -- selectivity
graduation_rate             -- completion within the standard window
-- College Scorecard outcomes side (college_scorecard) ----------------
median_debt_completers      -- median federal loan debt of completers
repayment_rate              -- share of borrowers paying the balance down
earnings_10yr_median        -- median earnings 10 years after entry
cip_code                    -- field of study (program-level layer)
credential_level            -- certificate, associate, bachelor, graduate

Limitations and analytical caveats

The joined IPEDS-Scorecard picture is the best public evidence on the value of college, but a value ratio is a sharp instrument that cuts both ways, and several limitations have to be held in view before any ranking is trusted.

The earnings figures cover only federally aided students.The Scorecard's debt and earnings measures are built from the records of students who received federal aid—loans or Pell grants—not from the entire student body. At institutions where a large share of students pay without federal aid, the reported cohort is unrepresentative, and the earnings figure describes the aided subset rather than the typical graduate. The measure is also keyed to entry, not graduation, so the earnings cohort mixes completers and non-completers unless the program-level view is used.

Selection effects masquerade as value. The deepest trap in this data is mistaking the students for the school. Selective institutions admit students who would have earned well anywhere; a high earnings figure can reflect who got in rather than what the education added. Disentangling an institution's contribution from its incoming class's characteristics requires controlling for selectivity, field mix, and student demographics—the raw ratio does not, and any honest comparison should hold those factors as constant as the data allows rather than ranking schools naively.

Suppression hollows out the small cells. To protect privacy, the Department suppresses outcome figures computed on small numbers of students—which is exactly where the program-level CIP-by-credential analysis is most useful and most fragile. Small programs, small institutions, and rare fields of study are disproportionately missing their earnings and debt values, so a program-level value table is densest for large, common programs and sparsest for the niche ones a particular student may care about most. Treating a suppressed cell as a zero, or dropping it silently, biases any aggregate built on top of it.

A single number is not the same as worth. An earnings-to-debt ratio reduces a profound personal decision to a financial quotient, and the data's own framing invites that reduction. But the value of an education is not only its wage premium: fields with high social value and modest pay, the consumption value of learning itself, regional cost-of-living differences that the national earnings figures do not adjust for, and the time lag between enrollment and the labor market the cohort actually faces all sit outside the ratio. Held with that humility, the joined nces_ipeds and college_scorecard tables are an extraordinary resource—the first time the price of an American degree and its measured payoff can be placed in the same row, school by school and program by program—and the right use of them is to inform the worth-it question, not to pretend it has a single arithmetic answer.

Related writing

College Scorecard: The Federal Dataset That Exposes Graduation Rates, Debt, and Earnings for Every US College — The outcomes half of the join in depth: how the Scorecard builds median debt, repayment, and post-enrollment earnings from federal student-aid and IRS records, and how the earnings-debt gap flags high-risk programs.

NCES IPEDS: The Federal Database Behind Higher Education Statistics for 6,000 US Colleges — The institutional half of the join: the mandatory census of enrollment, net price, admissions, and completion that supplies the cost and process side keyed by UnitID.

NAEP: The Nation's Report Card and the Federal Dataset Behind US Education Achievement — The K–12 achievement record upstream of the college-value question, measuring the learning students bring to the postsecondary system the Scorecard then prices.