Technical writing

Cross-Source Censorship Verification: Reconciling OONI, CensoredPlanet, and IODA

· 10 min read· AI Analytics
CensorshipVoidlyOSINTVerification

Voidly doesn't call something a “verified incident” on the basis of our own probe data alone. The public dataset only promotes an anomaly once at least one independent measurement project agrees — OONI, CensoredPlanet, or IODA. This post describes how that reconciliation actually works: what the three sources measure, how their data formats differ, how we align time windows, and what happens when they disagree.

Why three sources

Each project has different blind spots:

  • OONI runs on volunteer devices inside affected countries. Coverage is excellent where volunteers exist (Turkey, Russia, Iran, Egypt) but sparse in regions with fewer activists or higher probe-running risk (Central Asia, sub-Saharan Africa outside major metros). OONI data is available in near-real-time via the OONI Explorer API.
  • CensoredPlanet (University of Michigan) runs from a fixed set of measurement vantages outside affected countries, querying targets from known unblocked vantage points. This is less subject to volunteer distribution bias but doesn't see in-country BGP routing — it measures from outside looking in. CP data is published as bulk S3 datasets (not a real-time API), updated daily.
  • IODA (Georgia Tech) monitors BGP routing tables and active probing (ICMP, TCP) to detect connectivity outages at the AS level. IODA catches total shutdowns and large-scale BGP withdrawals that DNS or TLS tests miss. IODA data is available via an API and is very fast to update — BGP changes propagate globally in minutes.

A Voidly probe adds a fourth in-country perspective: TLS handshake analysis, HTTP response body matching, and throttling detection, from inside the jurisdiction. No single source sees everything; the union of four correlated signals is substantially more reliable than any alone.

The data format problem

The three sources use completely different schemas, time resolutions, and targeting models. Before any correlation is possible, the raw data has to be normalized:

# OONI measurement (simplified)
{
  "test_name": "web_connectivity",
  "measurement_start_time": "2025-09-15 14:32:01",
  "probe_cc": "TR",
  "input": "https://twitter.com/",
  "test_keys": {
    "blocking": "dns",
    "dns_experiment_failure": "dns_nxdomain_error",
    "accessible": false
  }
}

# CensoredPlanet Quack-v2 (simplified)
{
  "vantage":     "185.220.x.x",
  "location_cc": "TR",
  "domain":      "twitter.com",
  "time":        "1726404721",   # Unix timestamp
  "success":     false,
  "error":       "connection_reset",
  "protocol":    "HTTPS"
}

# IODA alert (simplified)
{
  "entity": { "type": "asn", "code": "9121", "name": "Turk Telekom" },
  "from":   1726390000,
  "until":  1726415000,
  "level":  "critical",
  "score":  0.92
}

The normalizer maps each format to a canonical internal event:

type CensorshipEvent = {
  source:      'voidly' | 'ooni' | 'censored_planet' | 'ioda';
  country:     string;   // ISO 3166-1 alpha-2
  asn?:        number;
  domain?:     string;   // null for IODA AS-level events
  block_type:  'DNS' | 'TLS' | 'HTTP' | 'BGP' | 'THROTTLE' | 'SHUTDOWN';
  confidence:  number;   // 0–1 from source's own scoring
  start:       Date;
  end?:        Date;
  raw:         unknown;  // original source record
};

Time-window alignment

Censorship events rarely start and stop at clean boundaries. A single blocking event might appear as: 200 OONI measurements spread over 4 hours; a CensoredPlanet batch scan that ran once during the window; and an IODA alert with 15-minute resolution. Correlating these requires a time-window alignment strategy rather than an exact timestamp match.

We use a sliding 4-hour corroboration window:

function overlaps(eventA: CensorshipEvent, eventB: CensorshipEvent): boolean {
  const windowMs = 4 * 60 * 60 * 1000;   // 4 hours
  const startA = eventA.start.getTime();
  const endA   = (eventA.end ?? new Date(startA + windowMs)).getTime();
  const startB = eventB.start.getTime();
  const endB   = (eventB.end ?? new Date(startB + windowMs)).getTime();

  // Events overlap if neither ends before the other starts
  return startA <= endB && startB <= endA;
}

For IODA AS-level events, we additionally check whether the ASN of the Voidly probe that detected the anomaly (or the ASN for the domain's DNS resolver) falls within the affected AS range reported by IODA. Country-code agreement is required for all non-IODA correlations.

The correlation engine

When a Voidly probe flags an anomaly above its local confidence threshold, the reconciler queries the other sources for corroborating events within the 4-hour window, matching on country code, domain (where applicable), and overlap:

async function correlate(voidlyEvent: CensorshipEvent): Promise<CorroborationResult> {
  const [ooniHits, cpHits, iodaHits] = await Promise.all([
    queryOONI(voidlyEvent.country, voidlyEvent.domain, voidlyEvent.start),
    queryCensoredPlanet(voidlyEvent.country, voidlyEvent.domain, voidlyEvent.start),
    queryIODA(voidlyEvent.country, voidlyEvent.asn, voidlyEvent.start),
  ]);

  const corroborating = [
    ...ooniHits.filter(e => overlaps(voidlyEvent, e)),
    ...cpHits.filter(e => overlaps(voidlyEvent, e)),
    ...iodaHits.filter(e => overlaps(voidlyEvent, e)),
  ];

  return {
    corroborated: corroborating.length > 0,
    sources:      corroborating.map(e => e.source),
    confidence:   aggregateConfidence(voidlyEvent, corroborating),
  };
}

Confidence scoring

When multiple sources agree, confidence increases — but not linearly. Two highly-correlated measurement methods (OONI and CensoredPlanet both testing HTTP from outside the country) are less independent than OONI (in-country DNS) and IODA (BGP table monitoring). The confidence formula weights source independence:

const SOURCE_INDEPENDENCE_WEIGHT = {
  'voidly + ooni':             0.75,  // similar method, different vantage
  'voidly + censored_planet':  0.70,  // similar method, external vantage
  'voidly + ioda':             0.90,  // different method, BGP-level
  'voidly + ooni + ioda':      0.97,  // three independent methods
  'all four':                  0.99,  // maximum
};

// Combined confidence
confidence = voidlyRawConfidence * SOURCE_INDEPENDENCE_WEIGHT[sourceCombination]

An event reaches “Verified” status in the public dataset at confidence ≥ 0.75. Events between 0.40 and 0.75 are published as “Corroborated.” Below 0.40, events are held as internal “Observed” signals — they inform our 7-day forecasting model but don't appear in the public dataset.

Handling disagreements

Sources disagree in several characteristic patterns:

Voidly flags, no external corroboration

Most common reason: the block is ISP-specific (a single ASN) and OONI/CP don't have probes on that ISP. We hold the event at “Observed” and check whether subsequent Voidly probe runs on the same ASN continue to flag the anomaly. A sustained single-probe pattern (4+ consecutive 5-minute windows) raises confidence enough to publish as “Corroborated” without external sources.

OONI flags, Voidly doesn't

OONI runs from many more devices in some countries than Voidly. If OONI shows 50+ independent measurements all returning a block page, we treat that as primary evidence even without a matching Voidly probe measurement. The corroboration direction doesn't have to start from Voidly.

IODA flags a total outage, DNS/TLS tests pass

Sometimes BGP routing tables show an AS withdrawing a prefix, but tests against IP addresses in that prefix still succeed from other vantage points. This is typically partial routing: the withdrawal has propagated to some but not all BGP peers. We flag as “Observed” for BGP_WITHDRAWAL with a note on the propagation fraction.

Sources agree on blocking, disagree on type

OONI might classify a block as “DNS” (no resolution) while CensoredPlanet sees a successful DNS resolution but a connection reset. This often indicates a multi-layer block — DNS tampering for some resolvers, TCP RST injection for others. The reconciler merges these as a compound block type:DNS + TLS_INTERFERENCE, which is more accurate than either alone.

Latency budget

We target a “probe-to-verified” latency of under 2 hours for events that reach the verified tier. In practice, the bottleneck is CensoredPlanet: CP publishes daily batch exports rather than a real-time API, so CP corroboration is unavailable for events in the most recent 12–24 hours. OONI and IODA both have near-real-time APIs (sub-5-minute latency), so for most events the effective corroboration window is OONI + IODA + Voidly. CP serves as retroactive confirmation that typically arrives the following day.

Detection → corroboration latency (typical):
  Probe anomaly detected:            T+0
  OONI query (API):                  T+5min
  IODA query (API):                  T+5min
  Internal confidence assessment:    T+6min
  Published as Corroborated/Verified: T+8min
  CensoredPlanet retroactive check:  T+12–24h

Coverage gaps we know about

  • Rural and mobile-only connectivity. Voidly probes are biased toward fixed-line broadband. Blocking that targets mobile ASNs specifically (common in sub-Saharan Africa and South Asia) may pass undetected if we lack a probe on the affected mobile ISP.
  • Throttling detection is harder than blocking.Throttling leaves DNS and TLS nominally working; only bandwidth measurements reveal the degradation. We measure bandwidth on every probe run, but the baseline varies enough between ISPs that the classifier has a higher false-negative rate for throttling than for hard blocks.
  • HTTPS vs HTTP split enforcement. Some ISPs block the HTTP version of a site but not HTTPS. Our test list tests HTTPS by default. A few country-level HTTP-only blocks have been missed until a researcher reported them manually.

For how the probe application that generates these measurements is built: The Voidly Probe: Tauri + boringtun network measurement at the operator's edge →

For a deep-dive on the IODA BGP signal used here: BGP routing signals and internet shutdown detection: how Voidly uses IODA data →

For how the per-measurement anomaly classification that feeds this reconciler works: The Voidly anomaly classifier: five interference classes and why we optimize for recall →

For how the OONI archive used here was processed into a structured dataset: Building the OONI historical corpus: 1.66M downloads and schema normalization →

For the model that consumes these verified events as training data: Seven-day internet shutdown forecasting: how Voidly predicts connectivity outages →

For the full schema of the corroboration fields in the published dataset: The Voidly measurement dataset: field-by-field schema reference →

For how Voidly distinguishes commercial geoblocking from government censorship before corroboration: Geoblocking vs. censorship: how Voidly distinguishes licensing restrictions, CDN geofencing, and GDPR blocks →

For the TLS layer of Voidly's censorship detection — certificate forgery, SNI blocking, and handshake timing analysis: How Voidly measures TLS censorship: certificate forgery, SNI blocking, and handshake interference →

For the seven-type taxonomy that classifies what kind of interference cross-source verification is detecting: Classifying internet interference: Voidly's seven-type taxonomy from TCP RST to application-layer blocking →