Technical writing
How Voidly measures bandwidth throttling: timing signals, body truncation, and the calibration problem
The five interference types Voidly classifies are not equally difficult to detect. DNS tampering, TLS interference, HTTP blocking, and BGP withdrawal all produce binary failures: the connection either succeeds normally or fails in a way that is structurally distinct from legitimate network behavior. Throttling is different. The connection works. The TLS handshake completes. The HTTP response arrives. But the content transfers at a fraction of its normal rate, or the body is cut short after a few kilobytes, or the TTFB is eight standard deviations above the baseline that every other request to this domain from this ASN has established over the past 90 days.
This makes throttling a continuous signal on a spectrum from “slightly slower than usual” to “effectively blocked.” The decision boundary between congestion and deliberate throttling sits in a feature region where the two phenomena genuinely overlap — an ISP segment under heavy load produces timing signatures that partially resemble a TSPU device shaping traffic to 50 Kbps. Voidly's ML pipeline achieves 0.87 recall and 0.79 precision on throttling, the lowest precision of the five interference classes, and the measurement design reflects that fundamental difficulty: more signals, more baselines, and more cross-probe corroboration than any other class.
The timing measurement stack
Every Voidly HTTP measurement produces a TimingFeatures struct that captures the full protocol lifecycle from DNS resolution through body transfer. For the other four interference classes, most of the relevant signal is in the first three fields — whether DNS returned NXDOMAIN, whether TLS completed, whether the HTTP status was 200. For throttling, the signal lives in the body transfer phase and in the normalized z-scores computed against per-ASN, per-domain baselines.
pub struct TimingFeatures {
pub dns_latency_ms: u32,
pub tcp_connect_ms: u32,
pub tls_handshake_ms: u32,
pub ttfb_ms: u32, // time-to-first-byte
pub body_transfer_ms: u32,
pub dns_latency_zscore: f32, // normalized vs. 90-day (asn, domain) baseline
pub tcp_connect_zscore: f32,
pub ttfb_zscore: f32,
pub connection_faster_than_expected_rtt: bool, // RST injection signal
pub body_truncated: bool,
pub rst_during_body: bool,
pub throughput_bps: Option<u64>, // computed if body_length known
pub throughput_ratio_vs_control: Option<f32>, // vs simultaneous control measurement
}The z-score fields are the primary throttling features: they normalize raw latency against what is normal for this specific (ASN, domain) pair, stripping out the baseline latency variation between a probe in Tehran and a probe in Tokyo. A TTFB of 800ms may be unremarkable for a probe on a low-bandwidth mobile connection measuring a server geographically distant from it, but a TTFB z-score of 6.2 means that 800ms is six standard deviations above what this probe normally sees for this domain — a signal that warrants classification.
The throughput_ratio_vs_control field requires a simultaneous control measurement: a request to the same URL made concurrently from a Voidly control server node in an uncensored network. When this ratio is available, it is the strongest single signal for throttling — it directly answers whether this probe's transfer rate is anomalously low relative to what an uncensored observer sees for the same resource at the same moment.
TTFB z-score computation
Each (ASN, domain) pair maintains a 90-day rolling baseline of TTFB measurements, stored as a mean and standard deviation updated nightly. The baseline is computed exclusively from measurements tagged interference_type == null — clean measurements with no detected anomaly. This ensures the baseline reflects genuine network conditions for this probe's ISP segment, not a contaminated average that includes periods when the ISP was already throttling.
def compute_ttfb_zscore(
current_ttfb_ms: float,
asn: str,
domain: str,
baseline_store: BaselineStore,
) -> float:
baseline = baseline_store.get(asn=asn, domain=domain)
if baseline is None or baseline.sample_count < 30:
return 0.0 # insufficient history: abstain
zscore = (current_ttfb_ms - baseline.mean_ms) / (baseline.std_ms + 1e-6)
return round(zscore, 3)The minimum sample count of 30 is a deliberate conservatism. A baseline computed from fewer observations has high variance — a single unusual measurement can inflate the standard deviation enough to suppress real z-scores, or deflate it enough to generate spurious ones. For (ASN, domain) pairs with insufficient history, the classifier abstains from the z-score feature and relies on body truncation and throughput ratio signals instead.
A z-score above 3.0 triggers the throttling classifier. Above 5.0, the classifier treats the measurement as near-certain throttling: a z-score this extreme has no plausible congestion explanation. Sustained network congestion in an ISP segment typically elevates TTFB uniformly across all domains, not selectively for a subset of politically sensitive ones — the selectivity test in section 5 below provides the additional confirmation.
Body truncation signals
Three patterns in the body transfer phase indicate throttling independently of the z-score computation:
body_truncated. The Content-Length response header declares an expected body size, but the received body is substantially smaller and the connection closed before transfer was complete. An ISP rate-limiter that drops the connection after delivering a partial response produces exactly this pattern: the TLS session completed cleanly, the HTTP headers arrived normally, and then the transfer stalled and reset. A Content-Length of 24KB with 8KB received before connection close is characteristic. This is distinct from a server error — a server sending a truncated response usually does so with a connection: close header and matching status, whereas a rate-limiter closes the connection mid-transfer without a proper HTTP termination sequence.
rst_during_body. A TCP RST received during the body transfer phase, after TLS and HTTP headers completed successfully. The timing of the RST is the classifier feature: RSTs during the TLS handshake indicate TLS interference; RSTs during the TCP SYN-ACK indicate TCP-level blocking. A RST that arrives specifically during the body transfer, after the server has already committed to sending a response, is the signature of a middlebox that monitors transfer volume and injects a RST when a threshold is crossed.
throughput_ratio_vs_control < 0.15. The probe's measured transfer rate is below 15% of what the control server sees for the same URL at the same time. This ratio threshold was calibrated against documented throttling incidents — the TSPU device cases in Russia consistently produce ratios in the 0.02–0.08 range; Iran's ARRS throttling during protest periods produces 0.05–0.12. Legitimate congestion, by contrast, rarely reduces throughput below 30–40% of the uncensored baseline for the same resource, because congestion affects the path symmetrically.
fn classify_throttling_signals(http: &HttpResult, timing: &TimingFeatures) -> ThrottlingSignals {
let severe_throughput_drop = timing.throughput_ratio_vs_control
.map(|ratio| ratio < 0.15)
.unwrap_or(false);
ThrottlingSignals {
body_truncated: http.body_length
.zip(http.body_first_4096.len().into())
.map(|(expected, got)| expected > 4096 && got < expected as usize / 3)
.unwrap_or(false),
rst_during_body: timing.rst_during_body,
throughput_suppressed: severe_throughput_drop,
ttfb_elevated: timing.ttfb_zscore > 3.0,
high_confidence_throttle: timing.ttfb_zscore > 5.0
|| (timing.rst_during_body && severe_throughput_drop),
}
}The high_confidence_throttle flag combines two of the strongest individual signals: a TTFB z-score above 5.0 alone is sufficient, because that magnitude has no congestion explanation in the empirical record; the combination of rst_during_body and a sub-0.15 throughput ratio is sufficient because both signals point to an active middlebox behavior rather than passive congestion.
Congestion vs. deliberate throttling — the calibration problem
This is the core difficulty of throttling classification. Both congestion and deliberate throttling elevate TTFB and depress throughput. The classifier must distinguish them without direct visibility into the ISP's network infrastructure, relying entirely on behavioral patterns observable from the probe's vantage point.
| Signal | Congestion | Deliberate throttling |
|---|---|---|
| TTFB z-score | Elevated (2–4) | Elevated (3–8+) |
| throughput_ratio | Low (0.15–0.5) | Very low (<0.15) |
| All probes in same ASN affected | Sometimes | Often |
| Multiple domains affected | Yes | Selective (target domains) |
| IODA BGP signal | Normal | Normal |
| Time of day pattern | Peak hours | All hours |
| Duration | Minutes–hours | Days–weeks |
The selectivity row is the most discriminating. Congestion degrades all traffic on a congested path — it is agnostic to the destination domain. Deliberate throttling targets specific domains or IP ranges: the TSPU device in Russia applies a policy that matches against a block list, so Tor exits and Psiphon endpoints are throttled while ordinary HTTPS traffic flows normally. When a probe in the same ASN sees elevated TTFB for instagram.com but normal TTFB for google.com, congestion is not a viable explanation — the two requests travel the same physical path to the same ISP peering point.
The time-of-day pattern provides a secondary separator. Network congestion is load-driven: it concentrates during evening peak hours when residential bandwidth is most contested. Deliberate throttling operates on a policy clock, not a load clock — it is active at 3am on a Tuesday as consistently as at 8pm on a Friday. Duration extends this: congestion clears when load drops, usually within hours. Throttling incidents in the Voidly dataset have a median duration of 11 days.
Cross-probe corroboration
Single-probe TTFB elevation is the weakest form of throttling evidence. A probe on a residential connection can see elevated TTFB for many reasons unrelated to censorship: the operator's home network is congested, their ISP has a transient peering problem, or the target server is temporarily slow. The classifier's confidence increases substantially when multiple probes in geographically distributed segments of the same country simultaneously observe TTFB elevation for the same domain.
The corroboration score applies the same independence-weighted aggregation used in the confidence tier system, but restricted to same-country probes and weighted by ASN diversity rather than geographic diversity. Probes on the same ASN are partially correlated — they share peering infrastructure — so they contribute less weight than probes on distinct ASNs within the country:
def throttling_corroboration_score(
domain: str,
country: str,
window_minutes: int = 30,
) -> float:
"""
Compute a corroboration score [0, 1] for throttling of domain in country
over the past window_minutes. Higher is more confident.
"""
elevated_probes = [
m for m in recent_measurements(domain=domain, country=country, minutes=window_minutes)
if m.timing.ttfb_zscore > 3.0 or m.timing.throughput_ratio_vs_control is not None
and m.timing.throughput_ratio_vs_control < 0.15
]
if not elevated_probes:
return 0.0
# Group by ASN. Within an ASN, additional probes contribute diminishing weight.
by_asn: dict[str, list] = {}
for m in elevated_probes:
by_asn.setdefault(m.probe_asn, []).append(m)
score = 0.0
for asn, probes in by_asn.items():
# First probe from each ASN contributes full weight
# Each additional probe from the same ASN contributes 1/(2^n)
asn_contribution = sum(1.0 / (2 ** i) for i in range(len(probes)))
score += asn_contribution
# Normalize: 3+ independent ASNs is considered near-certain
return min(1.0, score / 3.0)A corroboration score above 0.8 — equivalent to at least three probes on distinct ASNs all seeing TTFB elevation for the same domain within 30 minutes — reaches the VERIFIED confidence tier. Scores in the 0.4–0.8 range reach CORROBORATED. Single-probe elevation below 0.4 remains SINGLE_PROBE regardless of the z-score magnitude, because the calibration problem means we cannot confidently distinguish congestion from throttling without corroboration at that scale.
Known throttling patterns by country
Voidly's historical dataset contains well-characterized throttling signatures for several countries that recur consistently enough to inform the calibration thresholds.
Russia. TSPU (Technical Means of Countering Threats) devices deployed under Russia's sovereign internet law throttle Tor and Psiphon traffic to 25–50 Kbps while leaving other HTTPS at normal speeds. The signature is highly selective: throughput_ratio_vs_control drops below 0.05 for domains on the Roskomnadzor block list while simultaneously-measured domains outside the list show ratios above 0.9. The TLS handshake succeeds — the TSPU inspects the SNI and applies the rate limit post-handshake rather than blocking the connection — which is why these events do not classify as TLS interference.
Iran. ARRS (Telecommunication Infrastructure Company) and FARNet ISPs throttle social media platforms during protest periods. Instagram TTFB spikes 10–20× relative to the 90-day baseline, generating z-scores consistently above 8.0, while news sites and government domains are unaffected. The time-of-day independence is strong: the throttling runs at consistent intensity across all hours, including overnight, ruling out congestion as the explanation. Voidly logged 14 distinct throttling incidents in Iran during 2024, all correlated with documented protest activity.
India. Encrypted messaging apps — WhatsApp Web in particular — show documented throttling during regional elections. The primary signal is body_truncated: WhatsApp's web client downloads a large initial bundle, and ISPs implementing the election-period throttling allow the TLS handshake and HTTP response headers to complete before rate-limiting the transfer and then resetting the connection, producing a truncated body with a TCP RST at the cut-off point.
China. The Great Firewall more commonly uses DNS poisoning or TLS SNI blocking rather than throttling, making throttling events rarer in the China dataset. When observed, they concentrate around circumvention tool download sites: domains hosting Shadowsocks or V2Ray configuration downloads that the GFW has not yet added to the full block list will sometimes show throughput suppression at 0.10–0.20 before being promoted to a full DNS/TLS block. This makes China's throttling events useful as leading indicators of imminent full blocking.
Why throttling has the lowest classifier precision
The 0.79 precision figure for throttling, compared to 0.94+ for the other four interference classes, is traceable to a specific weakness in the training data pipeline. The weak supervision label functions that anchor the training labels for DNS tampering, TLS interference, and HTTP blocking — DNS NXDOMAIN detection, TLS certificate mismatch hashing, HTTP blockpage fingerprint matching — all produce high-confidence labels when they fire. These signals are structurally unambiguous: an NXDOMAIN for a domain that resolves correctly from the control server is a DNS tamper with near-zero false positive rate.
Throttling has no equivalent binary signal. None of the weak supervision label functions fire on a throttled measurement — the DNS response is correct, TLS completes, the HTTP status is 200, and no blockpage hash matches because the body is just slow or truncated, not replaced. The throttling labels in the training set come primarily from two sources: cross-probe corroboration events where the corroboration score exceeded 0.8, and IODA BGP data cross-referenced against documented throttling incidents in the public record (OONI Explorer reports, Freedom House country assessments, journalist reports of platform slowdowns). Both sources are noisier than blockpage or DNS signals.
The weekly active learning annotation cycle described in the ML training pipeline article is particularly important for throttling. Human annotators reviewing the uncertain-region samples — the 0.3–0.7 probability range where the classifier is least confident — are predominantly looking at throttling candidates. The 40–60 annotated samples per week improve the throttling decision boundary more than they improve any other class, because the other classes have abundant high-confidence weak supervision to anchor their boundaries while throttling relies more heavily on the human annotations.
Recall is higher than precision (0.87 vs. 0.79) because the classifier is tuned to favor false positives over false negatives for throttling: a missed throttling event represents a genuine censorship incident that goes undetected, while a false positive throttling call generates a SINGLE_PROBE flagged measurement that a human reviewer or downstream consumer can filter. The confidence tier system provides the practical filter — false positives at SINGLE_PROBE confidence are expected and labeled as such; the dataset consumers who need high precision can restrict to CORROBORATED or VERIFIED.
For how Voidly monitors probe node health and detects when a probe goes offline vs. when its ISP is blocking the collector: Voidly probe health monitoring: how we detect and replace failing probe nodes →
For the control server network that provides the simultaneous control measurements used to compute throughput_ratio_vs_control: The Voidly control server: how we tell censorship from a bad network →
For the full protocol lifecycle that produces the TimingFeatures struct — DNS through TLS to HTTP body: How Voidly measures HTTP and HTTPS censorship: the full protocol lifecycle from DNS through TLS to body comparison →
For the ML classifier that consumes these throttling features — and why throttling has lower precision than the other four interference classes: Voidly's ML training pipeline: building a labeled censorship dataset from OONI measurements →
For how throttling incidents are clustered and tracked over time — the 6-hour gap rule, RESOLVED_PENDING window, and FLAPPING detection: Incident clustering and deduplication: how Voidly avoids counting the same censorship event twice →