Technical writing
NCES IPEDS: The Federal Database Behind Higher Education Statistics for 6,000 US Colleges
The National Center for Education Statistics Integrated Postsecondary Education Data System collects annual data from every Title IV-eligible institution in the United States — 6,000+ colleges and universities reporting enrollment, graduation rates, tuition, faculty salaries, financial aid, and institutional finances — making IPEDS the most comprehensive federal database of US higher education.
What IPEDS is
IPEDS is the primary federal data collection for postsecondary education in the United States. It sits within the National Center for Education Statistics (NCES), which is itself a component of the Institute of Education Sciences (IES) inside the Department of Education. NCES is the federal statistical agency charged with collecting and analyzing data related to education in the United States and other nations; IES, created by the Education Sciences Reform Act of 2002, is its parent research arm.
The system traces its origins to the Higher Education General Information Survey (HEGIS), which the federal government administered from the 1960s onward to collect basic data about colleges and universities. IPEDS replaced HEGIS in 1986, consolidating previously fragmented surveys into a unified annual collection covering the full postsecondary sector. The shift also accompanied a change in the legal basis for participation: whereas HEGIS was largely voluntary, IPEDS reporting became mandatory for all institutions that participate in federal student financial aid programs under Title IV of the Higher Education Act. Because virtually every accredited US college and university — public, private nonprofit, and for-profit alike — accepts federal student loans and Pell Grants, the IPEDS universe covers essentially the complete population of degree-granting and certificate-granting postsecondary institutions in the country.
The roughly 6,000 institutions currently in IPEDS span a wide typological range:
- Public 4-year universities — flagship research universities, regional comprehensives, and historically Black colleges and universities (HBCUs) operating under state authority. These enroll the plurality of US undergraduates and account for the largest share of research expenditures in the IPEDS finance surveys.
- Private nonprofit 4-year institutions — ranging from elite research universities with multi-billion-dollar endowments to small liberal arts colleges with fewer than 1,000 students. Financial data for this sector reflects complex investment and gift income structures absent from public institutions.
- For-profit 4-year institutions — a sector that expanded rapidly from the 1990s through roughly 2013 and has contracted substantially since, driven by federal regulatory action and enrollment declines following investigative scrutiny of student outcomes. IPEDS data captures the contraction in enrollment and financial figures.
- Public 2-year institutions — community colleges, which serve more than 40% of all US undergraduates and are the primary gateway to postsecondary education for working adults, first-generation students, and lower-income populations.
- Private nonprofit and for-profit 2-year institutions — including trade schools, culinary institutes, allied health programs, and other specialized shorter-term credential providers.
Data is submitted through the IPEDS web-based data collection system, a secure institution-facing portal maintained by RTI International under contract to NCES. Each institution designates a Keyholder — typically an institutional research officer or registrar — who is responsible for submitting the annual surveys. The collection cycle runs from fall through spring of each academic year, with different survey components opening and closing on a rolling schedule. Submitted data goes through NCES edit checks that flag statistical anomalies and require institutions to explain or correct implausible values before the data is finalized.
The twelve survey components
IPEDS is not a single survey but a collection of twelve separate survey components, each capturing a distinct domain of institutional activity. They are designed to be linked by a common institution identifier (the IPEDS Unit ID, a six-digit number that persists across years) but are distributed as separate download files. Understanding what each component covers is essential before attempting to merge them.
- Institutional Characteristics (IC) — the directory survey. Covers institution name, address, Carnegie Classification, control (public/private nonprofit/for-profit), level (2-year/4-year/less-than-2-year), accreditation, academic calendar type, open admissions status, tuition and fees, and whether the institution offers distance education. This is the reference table that most IPEDS analyses join against to get institutional metadata.
- Fall Enrollment (EF) — headcount enrollment as of the institution's official fall enrollment census date, broken down by level (undergraduate/graduate/first-professional), attendance status (full-time/part-time), student type (first-time/transfer/continuing), gender, race/ethnicity, and residency (in-state/out-of-state/foreign). The race/ethnicity categories follow the federal OMB standards: Hispanic/Latino, American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, White, Two or more races, Race/ethnicity unknown, and Nonresident alien.
- 12-Month Enrollment (E12) — an unduplicated headcount of students enrolled at any point during the 12-month academic year, including summer sessions. Because community colleges and continuing education programs serve substantial summer populations not captured in fall headcounts, E12 often gives a more complete picture of institutional reach than EF alone. E12 also drives the full-time equivalent (FTE) enrollment calculation used in many finance and staffing analyses.
- Completions (C) — degrees and certificates awarded during the academic year, categorized by award level (certificate, associate, bachelor, master, doctoral, first-professional) and the Classification of Instructional Programs (CIP) code of the program. CIP codes are a six-digit taxonomy maintained by NCES: the first two digits identify the discipline family, the next two the subdiscipline, and the last two the specific program. Completions data is the primary source for analyses of what credentials US colleges produce and in what fields.
- Graduation Rates (GR) — tracks the first-time, full-time undergraduate cohort that entered the institution in a given fall and measures what fraction graduated within 150% of normal program time. For 4-year programs, 150% of normal time is six years; for 2-year programs, it is three years. GR also captures transfer-out rates, exclusion counts (students who died or are permanently disabled), and whether the institution receives transfer students who would be counted in a sending institution's denominator.
- 200% Graduation Rates (GR200) — an extension of the GR survey tracking cohorts through 200% of normal time (eight years for 4-year programs, four years for 2-year programs). Particularly relevant for community colleges where a substantial fraction of students who eventually complete do so on extended timelines due to part-time enrollment, work obligations, and stop-out patterns. GR200 often reveals a materially higher completion rate than GR150 for the 2-year sector.
- Outcome Measures (OM) — a broader graduation rate framework that captures all degree-seeking undergraduates, not just the first-time, full-time cohort. OM tracks four cohort types (first-time full-time, first-time part-time, non-first-time full-time, non-first-time part-time) and reports outcomes at 4, 6, and 8 years. For institutions where transfer students and part-time students dominate — which describes most community colleges and many regional comprehensives — OM provides a more representative picture of institutional completion performance than the traditional GR metric.
- Finance (F) — revenues, expenses, assets, and liabilities for the fiscal year. Revenue categories include tuition and fees (net of discounts), government appropriations (federal and state), government grants and contracts, private gifts and grants, investment return, sales and services of auxiliary enterprises (dormitories, dining, athletics), and hospitals. Expense categories are organized by functional classification: instruction, research, public service, academic support, student services, institutional support, scholarships and fellowships, and auxiliary enterprises. Separate reporting formats exist for public institutions (GASB standards) and private institutions (FASB standards).
- Student Financial Aid (SFA) — covers the full-year undergraduate population that received any type of financial aid: federal grants (Pell, SEOG), federal loans, state grants, institutional grants, and other aid. The SFA survey also includes average net price by income quintile — the actual cost after all grant aid for students in five family income brackets ($0–$30K, $30K–$48K, $48K–$75K, $75K–$110K, $110K+). Net price by income quintile is the single most useful field for understanding institutional affordability across the income distribution.
- Human Resources / Salaries (HR/S) — staff counts and salary data by occupation category, faculty rank, gender, and race/ethnicity. Faculty salary data distinguishes contract length (9/10-month vs. 11/12-month), rank (professor/associate/assistant/instructor/lecturer/no rank), and academic discipline. The occupational staff counts use the Standard Occupational Classification system and capture executive administrators, instructional staff, research staff, student services staff, library staff, and service/maintenance workers.
- Fall Staff (S) and Employees by Assigned Position (EAP) — two components that together provide a detailed census of institutional employees. Fall Staff covers instructional staff by full-time/part-time status and faculty designation. EAP covers all employees by primary occupational activity and whether they are full-time or part-time, providing an institution-wide workforce picture that HR/S alone does not give.
- Academic Libraries (AL) — library collections, staff, expenditures, and digital resource subscriptions. While specialized, library data is useful for research infrastructure analysis and for identifying institutional investment in scholarly resources as a proxy for research orientation.
Enrollment and completions data in depth
The Fall Enrollment survey is typically the first component analysts examine because it answers the most basic question about any institution: who goes there? The race/ethnicity breakdown in EF reflects OMB standards and reports nine mutually exclusive categories, with students who identify as two or more races counted in a separate “Two or more races” cell rather than being allocated proportionally. Nonresident aliens — students holding temporary or student visas — are reported separately and are not distributed across race/ethnicity categories, which affects denominator calculations for minority-serving institution designations.
Residency breakdowns (in-state, out-of-state, foreign) in EF are available for undergraduate students at degree-granting institutions and are particularly useful for analyses of institutional tuition revenue, since in-state and out-of-state students typically pay substantially different sticker prices at public universities. Out-of-state enrollment as a share of total enrollment has grown substantially at many flagship public universities since 2008, reflecting deliberate strategies to capture higher tuition revenue from non-resident students following state funding cuts.
The Completions survey captures every degree and certificate conferred during the academic year at the two-digit, four-digit, and six-digit CIP code level. At the six-digit level, the taxonomy distinguishes, for example, between Computer Science (11.0701), Computer Programming/Programmer, General (11.0201), and Information Technology (11.0103). Analysts using completions data to study workforce supply in specific technical fields need to understand which CIP codes map to the occupation categories they care about; NCES and the Bureau of Labor Statistics publish CIP-to-SOC crosswalks for this purpose.
Graduation rates in the GR component track only the first-time, full-time freshman cohort — a methodological choice that has drawn sustained criticism from community college researchers and equity scholars because it excludes the majority of students at open-access institutions. A community college where 70% of students enroll part-time, 40% are transfer students with prior college credits, and 30% are returning adults who stopped out has almost none of those students in its GR150 denominator. The measured completion rate — often 20–30% for community colleges — reflects this methodological restriction more than actual institutional performance for the full student body.
The 200% Graduation Rate (GR200) component and the Outcome Measures (OM) component address this partially. GR200 tracks the same first-time, full-time cohort through a longer window and consistently shows materially higher completion at community colleges: many institutions where the 3-year GR is 25% show 4-year GR200 rates of 35–45%, because a substantial fraction of students complete on a 4-year timeline rather than the expected 2-year timeline. OM goes further by including all degree-seeking students regardless of first-time or full-time status, and analysts studying community college performance should generally use OM rather than GR for cross-institutional comparisons.
Financial data
The Finance survey is among the most analytically rich and technically complex components in IPEDS. Revenues are reported net of contra-revenue items: tuition and fees revenue, for example, is reported after institutional grant and scholarship discounts are subtracted, so it reflects actual net tuition rather than gross sticker-price revenue. The gap between published tuition and the net tuition figure — the institutional discount rate — has risen dramatically at private nonprofit institutions since the 1990s, with many institutions now discounting 50% or more of gross tuition revenue through institutional grants.
Revenue by source reveals the business model of an institution. Public universities depend heavily on state appropriations and tuition; the ratio of these two sources has shifted dramatically since 2008 as state funding cuts increased the tuition- dependence of public higher education. Private nonprofit research universities with large endowments derive substantial revenue from investment returns and private gifts; for some elite institutions, endowment distributions exceed tuition net revenue. For-profit institutions derive almost all revenue from tuition, which in turn comes overwhelmingly from federal student loans and Pell Grants.
Expense data organized by functional classification allows comparison of how institutions allocate resources. The instruction-to-total-expense ratio is a standard measure of educational mission focus: institutions spending a lower share on instruction and a higher share on institutional support (administration) or auxiliary enterprises may be prioritizing non-educational functions. Research expenditures in IPEDS Finance often differ from separately reported research expenditures in the NSF Higher Education Research and Development (HERD) survey because IPEDS captures only institutionally funded research while HERD captures all sponsored research regardless of source. Analysts comparing research-intensity across institutions should use HERD rather than IPEDS Finance for research expenditure comparisons.
Endowment values are reported as the end-of-year market value of the institution's permanent endowment fund. The IPEDS endowment figure captures the total market value of all permanent and term endowments as of the fiscal year end, but does not report the spending rate, the asset allocation, or the return. NACUBO separately publishes the annual NACUBO-Commonfund Study of Endowments with investment return and spending policy data, which must be joined to IPEDS endowment figures for complete financial analysis.
For-profit institution financial data in IPEDS has been a particular area of analytical interest because IPEDS Finance captures only the higher education entity's finances, while many for-profit chains are subsidiaries of publicly traded companies. An institution's IPEDS revenue may look stable even as the parent company is drawing on credit lines to cover operating losses, a divergence visible only by combining IPEDS data with SEC financial disclosures for the parent corporation.
Student financial aid
The SFA survey covers the undergraduate population that received any financial aid during the full academic year — a broader population than the freshman cohort tracked in GR. Aid data is organized by type:
- Federal grants — primarily Pell Grants (the largest means-tested federal grant program, with awards up to $7,395 per year as of 2025–2026 for students from families below roughly $65,000 in income) and Supplemental Educational Opportunity Grants (SEOG, campus-based awards for Pell-eligible students).
- Federal loans — Direct Subsidized Loans (interest-free while enrolled), Direct Unsubsidized Loans, Direct PLUS Loans for parents, and Graduate PLUS Loans. The SFA survey captures loan volume for the undergraduate population; loan repayment outcomes are tracked separately through the National Student Loan Data System (NSLDS).
- State grants — state-funded need-based and merit-based grants, which vary enormously across states. Cal Grant in California, TAP in New York, and HOPE in Georgia represent three very different policy designs. IPEDS captures state grant dollars received by institution but does not identify which state programs contributed those dollars.
- Institutional grants — the discounts and fellowships funded from institutional revenues. At elite private universities, institutional grant aid often exceeds all other aid sources combined for enrolled students.
The net price by income quintile data in SFA is calculated by NCES from the institutional data and represents the average amount students in each income bracket actually pay after subtracting all grant aid (but not loans, which must be repaid). This is the most useful single-institution affordability metric available in federal data: it tells you what a family at a given income level can expect to pay at that specific institution, not the sticker price that very few students actually pay.
Cohort default rates (CDR) are related to but not contained within IPEDS. The 3-year CDR — the fraction of a borrower cohort that defaults on their federal student loans within three years of entering repayment — is calculated by the Department of Education from NSLDS data and published separately. However, CDR is deeply relevant to IPEDS users because institutions with 3-year CDRs above 30% for three consecutive years, or above 40% in a single year, risk losing Title IV eligibility. Losing Title IV eligibility effectively means closure for most institutions because federal student aid finances the majority of student payments at all but the wealthiest private colleges. The CDR threshold creates a direct link between IPEDS financial aid data, NSLDS loan outcomes, and institutional survival.
The College Scorecard relationship
IPEDS is an institutional-level reporting system: institutions submit aggregate counts, not individual student records. The Department of Education's College Scorecard, launched in 2015 and substantially expanded since, integrates IPEDS institutional data with individual-level federal records from two additional sources: NSLDS, which tracks every federal student loan and Pell Grant disbursement, and the Social Security Administration, which provides IRS earnings records for former students. The result is a linked dataset that connects IPEDS enrollment and completions information to actual post-enrollment labor market outcomes for the aid-receiving student population.
IPEDS serves as the institutional backbone of the Scorecard: enrollment figures, graduation rates, institutional characteristics, and financial aid flows in the Scorecard all derive from IPEDS submissions. The Scorecard adds what IPEDS cannot provide — individual-level outcomes — by linking former students through their Social Security Numbers to tax records that show what they earned one, five, and ten years after enrollment. The earnings linkage is restricted to students who received federal aid, which means the Scorecard systematically excludes students who attended college without loans or Pell Grants: a population concentrated at elite private universities where wealthy students can attend without borrowing.
The Scorecard's program-level data, added in 2020, links IPEDS CIP-code completions to program-specific earnings and debt outcomes. A prospective student choosing between computer science programs at two different universities can now look up not just the institution-level Scorecard metrics but field-of-study-specific earnings outcomes for CS graduates at each school. This granularity was unavailable before 2020 and has substantially expanded the range of accountability analyses possible with federal higher education data.
For analysts who need to move between IPEDS and Scorecard data, the common identifier is the IPEDS Unit ID, which appears as the UNITID field in IPEDS downloads and as the id field in the Scorecard API response. Matching on Unit ID is reliable; matching on institution name is not, because name variations, abbreviations, and legal entity changes introduce many false non-matches and false matches across datasets.
Data access
The primary access point for IPEDS data is the IPEDS Data Center at nces.ed.gov/ipeds/datacenter. The Data Center provides several access modes:
- Compare Institutions — a web-based tool allowing side-by-side comparison of up to four institutions on any combination of IPEDS variables. Useful for quick lookups but not for bulk analysis.
- Use the Data — downloadable Access database and CSV files organized by survey component and collection year. Each component is available as a separate ZIP archive containing a data file, a dictionary file, and a frequencies file. The dictionary maps variable names (often cryptic abbreviations) to their full descriptions and response categories.
- Data Trends — longitudinal visualizations of key IPEDS metrics over time, useful for quick trend checks but not machine-readable.
- IPEDS API — a newer REST API that provides programmatic access to IPEDS data without downloading ZIP files. Documentation is available at
educationdata.urban.orgvia the Urban Institute's Education Data Portal, which offers a more developer-friendly wrapper around IPEDS and other NCES datasets than the native NCES interface. - IPEDS Research Data Center (RDC) — restricted-access microdata for researchers who need institution-level data below the public suppression threshold. Access requires a project proposal approved by NCES and analysis conducted in a secure remote environment; results must be reviewed before export.
The Urban Institute Education Data Portal at educationdata.urban.org deserves special mention as the most analyst-friendly IPEDS access point. The Portal integrates IPEDS with the Common Core of Data (CCD, NCES's K–12 companion to IPEDS), the Civil Rights Data Collection (CRDC), and other NCES datasets under a unified REST API with consistent variable naming, pagination, and filtering. The Portal documentation maps IPEDS survey components to API endpoint paths and provides sample queries in Python and R. For most IPEDS analyses, the Urban Institute API is faster and less error-prone than manually downloading and joining NCES ZIP archives.
The NCES Common Core of Data (CCD) is the K–12 analog to IPEDS — an annual collection of data about every public elementary and secondary school and school district in the United States. Analysts studying the pipeline from K–12 to postsecondary education use CCD and IPEDS together; geographic identifiers in both datasets allow county- and state-level analyses of high school graduation rates and subsequent college enrollment patterns.
Python: accessing IPEDS via the Urban Institute Education Data Portal
The code below uses the Urban Institute Education Data Portal API to pull graduation rate data for public universities and fall enrollment data broken down by race. The API requires no authentication for public data. Endpoints follow a path structure of /api/v1/college-university/ipeds/[component]/[year]/ with filter parameters passed as query strings. The per_page parameter controls page size; the API returns a next URL in the response for pagination when more records exist.
import requests, pandas as pd
# Urban Institute Education Data Portal -- IPEDS API
# Docs: https://educationdata.urban.org/documentation/colleges.html
base = "https://educationdata.urban.org/api/v1"
# Get graduation rates for large public universities
endpoint = f"{base}/college-university/ipeds/grad-rates/2022/"
params = {
"control": 1, # 1 = public
"level_of_study": "undergraduate",
"per_page": 100,
}
resp = requests.get(endpoint, params=params, timeout=20)
data = resp.json()
results = data.get("results", [])
print(f"Public university graduation rates (2022): {data.get('count', 0)} institutions")
# Build DataFrame
df = pd.DataFrame(results)
if not df.empty and "grad_rate_150pct" in df.columns:
df["grad_rate_150pct"] = pd.to_numeric(df["grad_rate_150pct"], errors="coerce")
df_clean = df[df["grad_rate_150pct"].notna()].copy()
print(f"Median 6-year graduation rate (public): {df_clean['grad_rate_150pct'].median():.1f}%")
print(f"25th percentile: {df_clean['grad_rate_150pct'].quantile(0.25):.1f}%")
print(f"75th percentile: {df_clean['grad_rate_150pct'].quantile(0.75):.1f}%")
# Get enrollment data
enroll_endpoint = f"{base}/college-university/ipeds/fall-enrollment/race/2022/"
enroll_params = {"level_of_study": "undergraduate", "per_page": 20, "control": 1}
enroll_resp = requests.get(enroll_endpoint, params=enroll_params, timeout=20)
enroll_data = enroll_resp.json()
print(f"\nIPEDS fall enrollment by race records: {enroll_data.get('count', 0)}")
The Education Data Portal covers major IPEDS components including fall enrollment by race (fall-enrollment/race/), 12-month enrollment (enrollment-12-month/), completions by CIP code (completions/cip/), graduation rates (grad-rates/), finance (finance/), and institutional characteristics (institutional-characteristics/). The control parameter filters by institutional control: 1 for public, 2 for private nonprofit, 3 for for-profit. The level_of_study parameter distinguishes undergraduate from graduate populations where applicable. Variable names in the API response differ from variable names in the raw NCES downloads; the Portal documentation maps between the two naming conventions.
Analytical applications and known limitations
IPEDS data underpins a wide range of policy analyses, journalistic investigations, and institutional benchmarking studies. The most commonly cited applications are:
- Institutional benchmarking by sector (public 4-year, private nonprofit, community college) for enrollment trends, tuition revenue, staffing ratios, and graduation rates — the kind of analysis published annually by NCES in the Digest of Education Statistics and by organizations like the Delta Cost Project and the Hechinger Report.
- Minority-serving institution designation and compliance analysis — IPEDS enrollment race/ethnicity data determines eligibility for HBCU, HSI (Hispanic- Serving Institution), AANAPISI, and other federal designations that carry dedicated funding streams.
- State higher education accountability systems that use IPEDS graduation rates, enrollment shares, and completion by field to assess whether state universities are producing graduates in high-demand fields and meeting equity benchmarks.
- For-profit sector monitoring, where enrollment trends, graduation rates, default rates (from NSLDS), and financial stability indicators provide early warning of institutional fragility.
The most significant limitations are methodological. The graduation rate denominator restriction to first-time, full-time students, already discussed above, is the single largest source of misinterpretation in IPEDS-based analyses. Publications that rank community colleges by “graduation rate” using GR150 without acknowledging the denominator restriction are systematically misleading. A related problem affects institutions that deliberately enroll few first-time, full-time students — some community colleges have essentially no traditional freshman cohort and thus have GR denominators so small that the rate is statistically meaningless.
Finance data comparability across institutional types is limited by the GASB/FASB accounting difference. Public institutions use GASB standards, which classify auxiliary enterprise revenues and expenses separately; private institutions use FASB standards with different presentation. NCES provides technical guidance on reconciling the two formats for cross-sector comparisons, but direct ratio comparisons of, for example, instruction-to-revenue percentages between a public university and a private university require adjustment.
The annual collection cycle means IPEDS data is always 12–24 months behind current conditions. Data collected in fall 2024 and spring 2025 is typically finalized and publicly released in late 2025 or early 2026. During rapid enrollment shifts — the COVID-19 period saw the largest single-year enrollment decline in community colleges in IPEDS history — the lag between event and data release limits IPEDS's usefulness for real-time monitoring.
Suppression rules protect institutional identities in cells with very few students. NCES suppresses cells where the count is between 1 and 4 and applies a complementary suppression rule to prevent back-calculation from related cells. For small institutions and small demographic subgroups, suppression can affect a substantial fraction of published cells. Analysts should check the IPEDS data dictionary for the imputation flag variables that identify suppressed cells, which are reported as a separate indicator variable rather than as a missing value in the data itself.
For the College Scorecard dataset that links IPEDS enrollment data to federal loan records and IRS earnings outcomes for every US college: College Scorecard: The Federal Dataset That Exposes Graduation Rates, Debt, and Earnings for Every US College →
For the NSF Research Grants dataset covering $9 billion in annual basic science funding, the award taxonomy, CAREER award signals, and the Awards API: NSF Research Grants: Mapping $9 Billion in Annual Basic Science Funding →
For the NIH Research Portfolio covering $50 billion in annual biomedical grants, the IC funding distribution, activity code taxonomy, and the Reporter API v2: NIH Research Portfolio: The Federal Database Behind $50 Billion in Annual Biomedical Grants →