Verboten: Building a Queryable Index of Where Books Are Banned

Ask a search engine whether a particular book is banned in a particular country and you get a scattering of news stories, a Wikipedia paragraph, and an advocacy press release—rarely a straight answer, almost never a dated, sourced one. The facts exist. They sit in government gazettes, classification-board decisions, court rulings, library-association indexes, and a hundred news archives. What does not exist, in one place, is the join: a structured record of which title, banned where, when, by whom, and on what stated grounds. Verboten is that join.

It is a queryable index of book censorship worldwide— 19,283 titles that have been banned or restricted across 119 countries, recorded as 34,987 dated ban events with 35,230 source citations. It is part of Voidly, which measures what the network hides; Verboten records what the censor bans on paper. This is how it is built, how it counts, and why it is designed for machines first.

The source: an open censorship core

Verboten does not scrape. It is built on the banned-books.org Open Censorship Core, released under CC BY 4.0 with a citable Zenodo DOI. That open core is deliberately the verifiable layer of a larger catalogue: the structured facts of who banned what, where, when, and why, plus the reason taxonomy and the source citations that let each one be checked. It excludes the commercial dataset's editorial prose, cover images, and enrichment—which is exactly what makes it clean to build on. Verboten adds normalization, per-country and per-title pages, a lookup index, and a machine API; the underlying facts, and their license, are inherited intact.

Counting censorship honestly

The single most important rule in this data is how you count. A ban is recorded as an event, and one famous title can generate hundreds of events—a US book challenged across many school districts inflates raw event rows two- or threefold against the number of distinct titles actually affected. Verboten therefore ranks on distinct titles and distinct countries, never on raw event counts. The headline figures above are de-duplicated; the event count is reported separately and labelled as such.

Just as important is the distinction between kinds of act. Verboten keeps three apart and never merges them into a single “banned” total: banned (outright prohibition—23,924 events), restricted (lawful but constrained: no sale to minors, removed from a school library—10,854 events), and challenged (a request to ban that did not succeed—recorded, but never counted as a ban). Collapsing a restriction into a ban, or a failed challenge into either, is how censorship statistics become both wrong and litigable.

The data model

Each work carries a title, author, first-publication year, and original language. Each ban event ties a work to a country, the banning authority, the action type and status (active, historical, rescinded), the scope (national, school, government, customs, prison), the year it started and—where lifted—ended, one or more stated reasons drawn from a controlled taxonomy, and a citation with its own verification status. The reason taxonomy is what turns the data from a list into an instrument: you can ask not just where a book is banned but why, and compare the stated grounds across countries and decades.

What the data shows

The most-banned book in the world, by distinct countries, is The Satanic Verses, banned or restricted in 22 countries since 1988; Lady Chatterley's Lover, 1984, Lolita, and Animal Farm follow. The most common stated reason across the whole dataset is political content (cited against 9,813 distinct titles), then immorality, sexual content, and LGBTQ+ content—the last concentrated almost entirely in the recent US school-board surge, which dominates the most recent decade. The record runs back to the 1510s, so the same instrument that captures a 2024 Texas school-district removal also captures four centuries of state and church prohibition.

Built for machines

Verboten's largest intended user is not a human reader—it is an AI agent that needs a verifiable answer rather than a hallucinated one. So every answer is a plain, cacheable static file. There is no server, no API key, and no rate limit. A manifest at /verboten/api/index.json lists the dataset stats and the endpoint map; a per-country summary lives at /verboten/api/country/{ISO}.json for all 119 countries; the 200 most-banned titles each have a full source-cited record at /verboten/api/book/{slug}.json; and a single lookup index at /verboten/search-index.json maps every one of the ~19,400 titles to the countries that ban it. That last file is what answers the query the whole project exists for: is this book banned in that country?—for any title, not just the famous ones.

A worked example

Three short calls cover the common questions: resolve a title to its full ban record, summarize a country, and look up an arbitrary book by name. Because the endpoints are static JSON, the same code works from a notebook, a serverless function, or a tool call inside an agent.

import requests

BASE = "https://ai-analytics.org/verboten"

# Every answer is a static JSON file. No key, no rate limit, no server.
def country(iso):
    # iso = ISO 3166-1 alpha-2, e.g. "IR", "US", "CN"
    return requests.get(f"{BASE}/api/country/{iso}.json", timeout=30).json()

def book(slug):
    # slug for the 200 most-banned titles, e.g. "the-satanic-verses"
    return requests.get(f"{BASE}/api/book/{slug}.json", timeout=30).json()

# "Is The Satanic Verses banned in Iran, and on what grounds?"
b = book("the-satanic-verses")
print(b["title"], "banned/restricted in", b["countryCount"], "countries")
for x in b["bans"]:
    if x["country"] == "IR":
        print(x["countryName"], x["action"], x.get("yearStarted"), x["reasons"])

# The reasons books are most often banned in Iran
ir = country("IR")
for r in ir["topReasons"][:5]:
    print(r["count"], r["label"])

# Resolve any of the ~19,400 titles by lookup index (title -> countries)
idx = requests.get(f"{BASE}/search-index.json", timeout=60).json()
hit = next(x for x in idx["books"] if x["t"] == "1984")
print("1984 is banned/restricted in", hit["n"], "countries:", hit["c"])

The honest limits

A censorship index is only as complete as the records it can verify, and that completeness is uneven by construction. The data is strongest where advocacy groups, library associations, and a free press document bans in detail—which means the United States, with its school-board challenges, is heavily represented, while the most opaque censors are precisely the ones least likely to publish a record Verboten can cite. A low count for a closed country is therefore a statement about documentation, not about freedom. Every event carries a verification status, and the counting rules above keep the visible surface honest, but the dataset does not claim to be a complete census of world censorship. It is the most checkable structured view of it that exists openly—and, served as static JSON under CC BY 4.0, the easiest for the next tool to build on.

Related writing

What the World Bans, and Why — The companion data-story: what the assembled record actually reveals — political content as the leading reason, the American concentration of LGBTQ+ bans, and the 2020s surge.

Verboten — The Global Banned-Books Index — The index itself: search any title, browse all 119 countries, and read the source-cited record for the most-banned books, with the machine API documented in full.

Voidly — The Global Censorship Index — The network side of censorship: measured internet blocking, throttling, and shutdowns across 200 countries, the project Verboten extends from the wire to the page.

Voidly's country-level censorship score — How per-measurement signals aggregate into a per-country index; the same discipline of honest, weighted aggregation that governs how Verboten counts bans by distinct title and country.