Technical writing

NSF Research Grants: Mapping $9 Billion in Annual Basic Science Funding

· 13 min read· AI Analytics
NSFResearchScience FundingFederal Data

The National Science Foundation distributes more than $9 billion every year across roughly 12,000 research awards, and every one of those awards—the PI, the institution, the abstract, the dollar amount, the program officer, and the directorate—is a matter of public record. The NSF Award Search database at research.gov contains more than 600,000 awards stretching back to 1960, and a public JSON API makes the entire archive machine-readable with no authentication required. The dataset is the most comprehensive public window into federally funded basic research in the United States outside the life sciences, and it is almost completely unknown outside a narrow community of grants administrators and science policy analysts.

The NSF's structure and mission

Congress established the National Science Foundation in 1950 as an independent federal agency with a mandate to promote the progress of science and advance the national health, prosperity, and welfare. Unlike the National Institutes of Health, which focuses almost exclusively on biomedical research, NSF covers the full breadth of non-medical science: mathematics, physics, chemistry, computer science, engineering, earth science, social science, and STEM education. NSF funds roughly 25% of all federally funded basic research conducted at US colleges and universities outside the life sciences.

The agency operates through eight research directorates, each responsible for a cluster of scientific disciplines:

  • Biological Sciences (BIO) — organismal biology, ecology, evolutionary biology, molecular and cellular biology, environmental biology. The one NSF directorate that borders on NIH territory, though NSF's BIO focus is on fundamental mechanisms rather than disease applications.
  • Computer and Information Science and Engineering (CISE) — algorithms, networks, artificial intelligence, human–computer interaction, software and hardware systems. CISE has become NSF's fastest-growing directorate as federal AI and computing priorities have accelerated.
  • Engineering (ENG) — chemical, civil, electrical, mechanical, biomedical, and industrial engineering. ENG houses the Engineering Research Centers (ERCs), NSF's largest single-institution grant mechanism.
  • Geosciences (GEO) — atmospheric science, ocean science, earth science, and polar programs. GEO funds major shared infrastructure including ocean research vessels and polar research stations.
  • Mathematical and Physical Sciences (MPS) — mathematics, statistics, physics, chemistry, materials science, and astronomy. MPS funds the observatory infrastructure (including contributions to LIGO and major telescope facilities) and is the primary home of theoretical and experimental physics.
  • Social, Behavioral and Economic Sciences (SBE) — psychology, economics, sociology, anthropology, political science, linguistics, and science and technology studies. SBE has historically received the lowest funding success rates of any directorate and faces recurring congressional scrutiny over the social sciences receiving federal money.
  • Technology Innovation and Partnerships (TIP) — created in 2022 as NSF's first new directorate in more than three decades, TIP focuses on use-inspired research and the translation of basic science into economic applications. It is the primary vehicle for the CHIPS and Science Act innovation investments.
  • Education and Human Resources (EHR) — STEM education research, teacher preparation, undergraduate and graduate fellowships, and broadening participation initiatives. EHR operates on roughly $1 billion per year and is functionally separate from the research directorates.

Grant types and the program structure

NSF grants fall into a taxonomy that determines duration, dollar scale, and review process. Understanding these categories is prerequisite to making sense of the award data.

Standard, continuing, and renewal grants are the core research mechanisms. A standard grant disburses the full committed amount; a continuing grant disburses funds in annual increments subject to satisfactory progress reviews. Most NSF research grants in the $500k–$1.5M range over three to five years are continuing grants. Renewal applications—formally called “successor proposals”—are reviewed competitively like new submissions and appear in the award database as distinct awards with new award numbers.

CAREER awards are the Faculty Early Career Development Program, the single most prestigious grant mechanism for junior faculty. CAREER awards require the PI to be in a tenure-track faculty position and to submit an integrated research and education plan. Awards run five years and are typically $500,000–$600,000, though amounts vary by directorate. The CAREER award functions as the most widely recognized signal of early-career research distinction in non-biomedical science; hiring and promotion committees at research universities treat a CAREER award as roughly equivalent to an NIH R01 for career-stage purposes in fields where NIH does not fund.

RAPID and EAGER are exploratory mechanisms with expedited review. RAPID (Rapid Response Research) allows NSF to fund urgent research in response to unanticipated events—natural disasters, disease outbreaks, geopolitical disruptions—without waiting for a standard review cycle. EAGER (Early-Concept Grants for Exploratory Research) supports high-risk, potentially transformative ideas that are too preliminary for standard review panels. Both mechanisms are initiated by program officers rather than open competitions, limiting them to researchers with established NSF relationships.

RUI and MRI serve specific institutional purposes. Research at Undergraduate Institutions (RUI) grants are standard research awards with a modified review criterion recognizing that the work is conducted in a teaching-intensive environment without doctoral students. Major Research Instrumentation (MRI) awards fund equipment acquisition and development for shared research facilities; a single MRI award can range from $100,000 to $4 million and may be the largest single grant a predominantly undergraduate institution ever receives.

Large-scale infrastructure and centers represent NSF's highest dollar commitments. Engineering Research Centers (ERCs) are multi-institutional, industry-partnered centers funded at roughly $20 million over 10 years. Science and Technology Centers (STCs) are similarly scaled for fundamental research without the industry-translation mandate. The National Ecological Observatory Network (NEON) is an NSF-funded continental-scale ecological monitoring infrastructure operated through a cooperative agreement with Battelle. These center-scale awards dominate the dollar totals in the award database but represent a small number of individual records.

Proposal submission and the review process

All NSF proposals are submitted through Research.gov, which replaced the legacy FastLane system as the primary submission portal in 2021. The governing document is the Proposal and Award Policies and Procedures Guide (PAPPG), updated annually, which specifies page limits, required sections, budget categories, formatting rules, and the conditions attached to awards. The PAPPG is the operational bible for NSF-funded researchers and grants administrators.

NSF uses a two-stage merit review process for most programs. Proposals are first distributed to ad-hoc reviewers—typically three to five subject-matter experts recruited from the research community—who submit individual written reviews. A panel of five to fifteen reviewers then meets (in person or virtually) to discuss the proposals against those individual reviews and produce panel summaries with recommended funding decisions. Program officers retain significant discretion in making final funding decisions, including funding proposals rated “Good” over those rated “Very Good” if program balance, institutional diversity, or other portfolio considerations warrant.

The two NSF merit review criteria have been unchanged since 1997: Intellectual Merit (the potential to advance knowledge within and across fields) andBroader Impacts (the potential to benefit society). Both criteria must be addressed explicitly in every proposal and evaluated separately by reviewers. The Broader Impacts criterion—which encompasses training of students, outreach activities, diversity, and societal relevance—is unique to NSF and a frequent source of confusion for researchers more familiar with NIH review criteria.

Funding rates vary substantially by directorate. Biological Sciences has historically funded around 25% of submitted proposals; CISE funds in the 20–22% range; SBE funds at 17–20%, the lowest of any directorate. These rates are averages across programs within each directorate and mask significant variation: competitive fellowship programs like the GRFP fund at roughly 16%, while some niche programs within MPS fund over 30% of submissions. NSF receives approximately 40,000–50,000 proposals per year across all programs and funds roughly 12,000.

The award database: research.gov and the NSF API

The NSF Award Search at research.gov/award-simple-search provides a publicly searchable interface to the full awards database. Each award record contains:

  • Award Number — the unique seven-digit identifier, e.g.2345678. Award numbers are stable permanent identifiers; the same number appears in publications, conference presentations, and researcher CVs.
  • Title — the proposal title, which for CAREER awards conventionally begins with “CAREER:” followed by the research title. Title prefix filtering is the most reliable way to isolate CAREER awards in API queries.
  • Principal Investigator and Co-PIs — PI name and, for awards with multiple investigators, co-PI names. Unlike NIH, NSF does not assign persistent researcher IDs in the public-facing award record, making longitudinal career tracking dependent on name matching.
  • Institution — awardee organization name, city, state, and UEI number. The state code enables geographic analysis and EPSCoR eligibility verification.
  • Start and Expiration Dates — the funded period. Expiration date frequently extends beyond the originally approved period through no-cost extensions, which appear as updated records in the database.
  • Obligated Amount — the total federal funds obligated to the award across all funded periods. For continuing grants this accumulates across annual increments.
  • Program Officer — the NSF program officer responsible for the award. Program officer patterns are a rich signal for understanding which scientific communities have representation inside NSF.
  • Division and Directorate — the organizational unit within NSF. Division names (e.g., Division of Computing and Communication Foundations) are more granular than directorate codes; the API returns the fund program name, which requires mapping to directorate for aggregate analysis.
  • Abstract — full text of the project abstract as submitted. Abstracts are the primary substrate for topic modeling, keyword trend analysis, and thematic clustering across the award corpus.

The NSF API at https://api.nsf.gov/services/v1/awards.json requires no authentication and returns JSON responses. Parameters include keyword search, date ranges, institution, PI name, award ID, and a fields parameter controlling which record attributes are returned. The API paginates at 25 records per page using an offset parameter. Bulk XML and CSV downloads are available through research.gov for full corpus access; the API is more practical for targeted queries and incremental analysis.

Institutional concentration and EPSCoR

NSF funding is heavily concentrated. The top 100 universities by NSF receipts capture roughly 75% of total award dollars in any given fiscal year. MIT, Stanford, Caltech, UC Berkeley, University of Michigan, University of Illinois at Urbana-Champaign, and Georgia Tech consistently appear at the top of institutional rankings. This concentration is not accidental: the merit review process systematically advantages institutions with larger research infrastructure, more competitive graduate programs to generate proposals, and more experienced grants administration offices that reduce administrative friction.

EPSCoR—the Established Program to Stimulate Competitive Research—is NSF's primary mechanism for countering this concentration. EPSCoR designates 28 states (and territories) that historically receive a disproportionately small share of federal research funding relative to their research capacity; those states receive preferential access to supplemental EPSCoR awards and a dedicated infrastructure improvement program. State eligibility is determined by NSF awards received over a rolling five-year window; states that exceed a threshold share of total NSF funding graduate out of EPSCoR eligibility, which has historically been a slow process.

Historically Black Colleges and Universities (HBCUs) and Minority-Serving Institutions (MSIs) have dedicated NSF programs—the HBCU-UP (Undergraduate Program) and HSI (Hispanic-Serving Institution) programs within EHR—but remain substantially underrepresented in core research directorates relative to their student enrollment and, increasingly, their research faculty capacity. The gap between representation in EHR education programs and representation in BIO, CISE, and MPS research programs is a persistent policy concern that has generated multiple NSF advisory committee reports without producing commensurate funding shifts.

STEM education: GRFP, REU, and S-STEM

The Education and Human Resources directorate operates the most visible and highest-impact training programs in US graduate STEM education. The Graduate Research Fellowship Program (GRFP) is the most prominent: it provides a $37,000 annual stipend plus a $12,000 cost-of-education allowance for three years of graduate study, awarded to approximately 2,000 students per year from roughly 12,000 applicants. The GRFP is applied for and awarded before or during the first two years of a PhD program, before a student has produced substantial research, making it a bet on potential rather than demonstrated output. A GRFP fellowship functions as the most widely recognized external validation of a graduate student's early-career trajectory; it appears prominently in faculty job applications and postdoctoral fellowship competitions for a decade after the award.

Research Experiences for Undergraduates (REU) is a network of summer research programs at universities and national laboratories, each funded by a site grant that supports 8–12 undergraduate students per summer with stipends and housing. REU sites are searchable through the NSF Award Search; the program is a primary pipeline for undergraduates from non-research institutions into PhD programs. Research Experiences for Teachers (RET) operates similarly for K–12 educators, placing teachers in research laboratories for summer experiences intended to improve STEM instruction.

NSF Scholarships in Science, Technology, Engineering, and Mathematics (S-STEM) provides scholarships of up to $10,000 per year to academically talented, financially disadvantaged students in STEM undergraduate and graduate programs. S-STEM is institutionally administered—universities apply for cohort grants and then select individual student recipients—which means the award database reflects institutional awards rather than individual student recipients. The CAREER award is the faculty analog to the GRFP: the most widely recognized career signal for non-biomedical researchers at the transition from early- to mid-career.

Artificial intelligence and emerging technology investments

The period 2019–2024 saw a sustained acceleration in NSF AI funding that is visible in the award database through keyword analysis. The National AI Research Institutes program, launched in 2019, has funded 25 university-led institutes with a combined investment exceeding $200 million, each organized around a different application domain: AI for agriculture, AI for climate, AI for education, AI for scientific discovery. The institutes are collaborative structures spanning multiple universities and are funded as cooperative agreements rather than standard grants, appearing in the database as single large-dollar awards.

The creation of the Technology Innovation and Partnerships directorate in 2022 represents the most significant structural change to NSF in decades. TIP was established by the CHIPS and Science Act to create a dedicated pathway for translating federally funded basic research into economic and national security applications—a function that NIH accomplishes through the SBIR/STTR programs and that DARPA accomplishes through a different agency entirely. The flagship TIP program is NSF Engines (Regional Innovation Engines), which awards 10-year cooperative agreements of up to $160 million each to develop regional innovation ecosystems anchored in science and technology strengths. The TIP directorate also coordinates NSF's quantum information science investments, semiconductor research (directly connected to CHIPS Act appropriations), and the convergence accelerator program that funds cohorts of teams working on defined societal challenges.

Abstract keyword analysis of the CISE directorate awards from 2018 to 2025 shows a pronounced shift: “deep learning,” “neural network,” “foundation model,” and “large language model” appear with rapidly increasing frequency while “data mining” and “information retrieval” have plateaued. Within MPS, quantum computing and quantum sensing terminology has grown faster than any other keyword cluster. These shifts are auditable directly from the NSF award abstract corpus without any additional data source.

International science agreements and conflict-of-interest compliance

NSF maintains bilateral research cooperation agreements with counterpart funding agencies in Germany (DFG), the United Kingdom (UKRI), Japan (JST), South Korea (NRF), and several other countries, as well as participation in EU Framework Programme partnerships. These agreements create jointly funded grants where collaborating teams submit coordinated proposals to their respective national agencies; the NSF side appears in the award database as a standard domestic award, with the international collaboration noted in the abstract rather than through a structured field. The Office of International Science and Engineering (OISE) coordinates these partnerships and administers the International Research Experiences for Students (IRES) program, which funds US students to conduct research at international partner institutions.

The period 2018–2023 saw significant changes to NSF requirements around foreign financial conflicts of interest. NSF adopted the requirement that all senior personnel on NSF-funded projects disclose current and pending support from all sources, including foreign government-sponsored talent recruitment programs, through a standardized Current and Pending Support form submitted via Research.gov. Failure to disclose is a federal crime under 18 USC 1001; several prosecutions arose from NSF-funded researchers who failed to disclose participation in Chinese government talent programs. These requirements have had measurable chilling effects on US–China scientific collaboration that are visible in the abstract corpus as a decline in explicitly China-referencing collaborative language from approximately 2019 onward.

Open science and data management requirements

NSF's public access requirements are among the strictest of any federal funding agency. As of 2023, all NSF-funded publications must be made freely available immediately upon publication with no embargo period—a policy that exceeds the 12-month embargo that NIH permitted under its prior public access policy. This immediate open access mandate applies to peer-reviewed articles and conference papers resulting from NSF funding, regardless of which journal or publisher publishes them; publishers must either publish open access immediately or accept submission of the author accepted manuscript to NSF's designated Public Access Repository.

Data Management Plans (DMPs) have been required in NSF proposals since 2011: every proposal must include a two-page supplementary document describing how the research data generated will be managed, shared, and preserved. DMPs are reviewed as part of merit review but are not scored separately; the practical effect has been to establish data sharing as a community norm in NSF-funded research without uniformly enforcing specific data deposit requirements. NSF Award Conditions require data sharing and provide program officers with tools to enforce sharing through progress reporting and prior-approval requirements for no-cost extensions.

The reproducibility dimensions of the NSF data are sharpest in SBE. The replication crisis—the failure of a substantial fraction of published psychology and social science results to replicate under pre-registered conditions—has been most extensively documented in fields that NSF primarily funds. NSF has responded through dedicated reproducibility programs, support for the Open Science Framework, and increased emphasis on statistical power and pre-registration in SBE review criteria. The trajectory of these investments is traceable through SBE award abstracts.

Python: querying the NSF Awards API for CAREER grant analysis

import requests
import pandas as pd
from collections import Counter, defaultdict
import re

# NSF Awards API — https://api.nsf.gov/services/v1/awards.json
# No authentication required. Returns JSON with awards array.

BASE = "https://api.nsf.gov/services/v1/awards.json"

FIELDS = ",".join([
    "id", "title", "abstractText", "awardeeName", "awardeeStateCode",
    "piFirstName", "piLastName", "fundsObligatedAmt", "date",
    "startDate", "expDate", "pdPIName", "primaryProgram",
    "transType", "agency", "awardAgencyCode", "fundProgramName",
    "parentUeiNumber", "ueiNumber", "poName",
])

def fetch_career_awards(start_year: int, end_year: int) -> list[dict]:
    """
    Pull all CAREER awards funded between start_year and end_year.
    NSF CAREER award titles consistently contain 'CAREER:' as a prefix.
    We filter by keyword and date range, paginating through results.
    """
    all_awards = []
    offset = 1  # NSF API uses 1-based offset

    for year in range(start_year, end_year + 1):
        date_from = str(year) + "0101"
        date_to   = str(year) + "1231"
        params = {
            "keyword": "CAREER",
            "dateStart": date_from,
            "dateEnd": date_to,
            "transType": "Grant",
            "fields": FIELDS,
            "printFields": FIELDS,
            "offset": 1,
            "rpp": 25,  # records per page (max 25)
        }

        # Page through results for this year
        while True:
            resp = requests.get(BASE, params=params, timeout=30)
            resp.raise_for_status()
            data = resp.json()
            awards = data.get("response", {}).get("award", [])
            if not awards:
                break
            all_awards.extend(awards)
            if len(awards) < 25:
                break
            params["offset"] += 25

    return all_awards


# Pull CAREER awards from the last 5 fiscal years (2021-2025)
print("Fetching CAREER awards 2021-2025...")
awards = fetch_career_awards(2021, 2025)
print(f"Retrieved {len(awards)} CAREER awards")

# Build a tidy dataframe
rows = []
for a in awards:
    title = a.get("title", "")
    # Keep only genuine CAREER awards (title starts with CAREER:)
    if not re.match(r"^CAREER:", title, re.IGNORECASE):
        continue
    amt_raw = a.get("fundsObligatedAmt", "0") or "0"
    try:
        amount = float(str(amt_raw).replace(",", ""))
    except ValueError:
        amount = 0.0

    # Extract year from startDate field (format: MM/DD/YYYY)
    start_date = a.get("startDate", "")
    year = start_date.split("/")[-1] if start_date else "Unknown"

    rows.append({
        "award_id":    a.get("id"),
        "title":       title,
        "pi":          a.get("piFirstName", "") + " " + a.get("piLastName", ""),
        "institution": a.get("awardeeName"),
        "state":       a.get("awardeeStateCode"),
        "program":     a.get("fundProgramName"),
        "amount":      amount,
        "year":        year,
        "abstract":    a.get("abstractText", ""),
    })

df = pd.DataFrame(rows)
print(f"Genuine CAREER awards after title filter: {len(df)}")

# --- 1. Average CAREER award size by directorate ---
# The fundProgramName field encodes directorate context (e.g., "Division of
# Computing and Communication Foundations", "Division of Chemistry"). We map
# to directorate using keyword matching.

DIRECTORATE_MAP = {
    "BIO": ["Division of Biological", "Biological Sciences", "IOS", "MCB", "DEB", "EF"],
    "CISE": ["Computing", "Information Science", "Computer and Inf", "CCF", "CNS", "IIS", "OAC"],
    "ENG": ["Engineering", "CBET", "CMMI", "ECCS", "EFMA", "EEC"],
    "GEO": ["Geosciences", "Atmospheric", "Earth Sciences", "Ocean Sciences", "AGS", "EAR", "OCE"],
    "MPS": ["Mathematical", "Physical Sciences", "Chemistry", "Physics", "Astronomy", "DMS", "DMR", "PHY", "CHE", "AST"],
    "SBE": ["Social", "Behavioral", "Economic", "BCS", "SES", "SMA"],
    "TIP": ["Technology Innovation", "Translational", "TIP"],
    "EHR": ["Education", "Human Resources", "EHR", "DRL", "DGE", "HRD"],
}

def map_directorate(program: str) -> str:
    if not program:
        return "Unknown"
    for code, keywords in DIRECTORATE_MAP.items():
        if any(kw.lower() in program.lower() for kw in keywords):
            return code
    return "Other"

df["directorate"] = df["program"].apply(map_directorate)

dir_stats = (
    df.groupby("directorate")["amount"]
    .agg(
        award_count="count",
        total_funded="sum",
        avg_award="mean",
        median_award="median",
    )
    .sort_values("award_count", ascending=False)
)
print("\nCAREER awards by directorate:")
print(dir_stats.to_string())

# --- 2. Institutions ranked by total CAREER awards received ---
inst_rank = (
    df.groupby("institution")["award_id"]
    .count()
    .sort_values(ascending=False)
    .rename("career_award_count")
    .reset_index()
)
print("\nTop 15 institutions by CAREER award count:")
print(inst_rank.head(15).to_string(index=False))

# --- 3. Average award size per institution (min 3 awards) ---
inst_avg = (
    df.groupby("institution")
    .agg(count=("amount", "count"), avg=("amount", "mean"))
    .query("count >= 3")
    .sort_values("avg", ascending=False)
)
print("\nInstitutions with highest avg CAREER award (min 3 awards):")
print(inst_avg.head(10).to_string())

# --- 4. Fastest-growing research areas by abstract keyword ---
# Tokenise abstracts into content words, track year-over-year frequency

STOPWORDS = {
    "the", "a", "an", "and", "or", "of", "in", "to", "for",
    "is", "are", "that", "this", "with", "on", "be", "will",
    "from", "at", "by", "as", "have", "it", "not", "we", "our",
    "project", "research", "study", "award", "nsf", "grant",
    "program", "career", "development",
}

def top_words(text: str, n: int = 30) -> list[str]:
    tokens = re.findall(r"[a-z]{4,}", text.lower())
    return [t for t in tokens if t not in STOPWORDS][:n]

# Build keyword frequency per year
year_word_counts: dict[str, Counter] = defaultdict(Counter)
for _, row in df.iterrows():
    words = top_words(row["abstract"])
    year_word_counts[row["year"]].update(words)

# Compare earliest vs latest year in the window
years_sorted = sorted(y for y in year_word_counts if y.isdigit())
if len(years_sorted) >= 2:
    early_year = years_sorted[0]
    late_year  = years_sorted[-1]
    early = year_word_counts[early_year]
    late  = year_word_counts[late_year]
    all_terms = set(early) | set(late)
    growth = {
        term: late.get(term, 0) - early.get(term, 0)
        for term in all_terms
    }
    top_growing = sorted(growth.items(), key=lambda x: x[1], reverse=True)[:20]
    print(f"\nFastest-growing abstract keywords ({early_year} -> {late_year}):")
    for term, delta in top_growing:
        print(f"  {term:<25} +{delta}")

The NSF API returns a maximum of 25 records per request, which requires pagination loops for any analysis covering more than a few dozen awards. The keyword parameter performs full-text search across titles and abstracts simultaneously; filtering on title prefix in post-processing is necessary to isolate genuine CAREER awards from other awards that happen to mention “career” in their abstracts. The fundsObligatedAmt field represents cumulative obligated funds across the award lifetime and may update as annual continuing increments are processed, so the same award pulled at different times may show different amounts. Directorate mapping requires a keyword match against the fundProgramName field, which contains division names rather than directorate codes; the mapping shown above covers the most common patterns but will require extension for edge cases involving interdisciplinary programs and the TIP directorate's newer program names.

The abstract keyword trend analysis at the end of the script provides a lightweight signal for identifying research areas gaining momentum within the CAREER award corpus. Applying the same pattern to the full NSF award database—pulling all awards by year from a specific directorate and tracking keyword frequency shifts—gives a coarse but readily auditable view of how NSF's research portfolio priorities have shifted over time. For more rigorous topic modeling, the full abstract corpus downloaded via bulk export is more appropriate than the paginated API.


For the NIH companion dataset covering $40 billion in annual biomedical research funding, the Reporter API, activity code taxonomy, indirect cost rates, and payline analysis: NIH Research Grant Data: Mapping $40 Billion in Annual Biomedical Funding →

For the USASpending dataset covering $700 billion in annual federal contracts, the FPDS-NG data structure, DOD concentration, and small business set-asides: USASpending Federal Contracts: Tracing $700 Billion in Annual Government Procurement →

For the BEA national accounts data covering GDP components, industry output, personal income, and how federal research spending flows through the national accounts framework: BEA GDP Accounts: Reading the National Income and Product Accounts →