Public money funds a great deal of early research, and a recurring political question is what the public gets back for it. The honest answer is that the path from a federal grant to a marketed product is real but hard to trace — the grant, the patent, and the product live in three different agencies' databases that do not share an ID. They do, however, share a clause. This is how to follow the money from the lab bench to the pharmacy shelf using only public federal data.
The four datasets
- NIH research grants (RePORTER) and NSF awards — the funding: who got how much, for what project, with which principal investigator.
- USPTO patents — the inventions, their assignees and inventors, and — crucially — the government-interest statements that disclose federal funding.
- FDA approvals — the subset of inventions that became regulated products reaching the market.
The bridge: the Bayh-Dole government-interest clause
The 1980 Bayh-Dole Act let universities and contractors patent inventions made with federal funding — on the condition that the resulting patent disclose that funding in a “government interest” statement. That single requirement is the bridge the rest of the pipeline hangs on: a patent that reads “This invention was made with government support under Grant No. … awarded by the National Institutes of Health” can be tied directly back to a RePORTER award. Patents expose this field; joining it to the grant data turns “federal R&D” from an abstraction into a traceable chain.
# The elegant join: Bayh-Dole requires patents arising from federal funding to
# disclose it in a "government interest" statement, naming the funding agency
# (and often the award/contract number). That clause is the bridge from a patent
# back to the grant that paid for it.
import requests
# 1. USPTO / PatentsView: patents assigned to an institution, with gov-interest text.
pv = requests.post(
"https://search.patentsview.org/api/v1/patent/",
json={
"q": {"_text_phrase": {"gov_interest_statement": "National Institutes of Health"}},
"f": ["patent_id", "patent_title", "patent_date", "assignees.assignee_organization"],
"o": {"size": 10},
}, timeout=30,
).json()
# 2. NIH RePORTER: the grants to that institution (PI, project, dollars).
nih = requests.get(
"https://api.reporter.nih.gov/v2/projects/search", timeout=30,
json={"criteria": {"org_names": ["MASSACHUSETTS INSTITUTE OF TECHNOLOGY"]}},
)
# 3. FDA Drugs@FDA / openFDA: which of those inventions reached market as products.
# Join chain: grant (agency + institution + PI) -> patent (gov-interest + assignee
# + inventor) -> product (sponsor + ingredient). Names, not IDs, do the linking.Where the join is hard
- Names, not identifiers. Grantee institution, patent assignee, and FDA sponsor are matched on names that drift across spellings, mergers, and subsidiaries — the same entity-resolution problem that recurs across federal data.
- Disclosure is imperfect. Not every federally-supported patent files a complete government-interest statement; the clause undercounts, so the pipeline is a floor on the funding-to-IP link, not a census.
- Inventor vs principal investigator. The patent inventor and the grant PI are often the same person under different name forms — a useful secondary join, and a noisy one.
- Long, variable lag. Grant to patent to product can span a decade or more; any year-over-year “return” comparison has to account for the lag rather than aligning calendar years.
What the pipeline reveals
Assembled, the datasets answer the return-on-public-R&D question with evidence instead of rhetoric: which institutions convert federal grants into patented inventions most often, which agencies' funding shows up behind the most-cited patents, and which approved products trace back through their patents to a public grant. It is the upstream half of the story whose downstream half — what happens once a drug is approved, marketed, and prescribed — the drug-lifecycle pipeline picks up. Both are built from public datasets, joinable by anyone willing to wrangle names and a disclosure clause.
Related writing: The Research Funding Pipeline — the funding-and-integrity view of federal research (NIH, NSF, and research-misconduct records), the stage just upstream of the patents here.
See also: The Drug Lifecycle in Federal Data — what happens to an invention after approval: marketing, prescribing, spending, and harm.