Technical writing

Voidly's TCP measurement layer: RST injection detection, null-routing, and connection timing analysis

· 8 min read· AI Analytics
CensorshipVoidlyMethodology

When a Voidly probe can't reach a target, the failure often shows up first at the TCP layer. The HTTP client sees a connection error. The TLS handshake never starts. But the actual cause — a forged RST packet injected by a deep-packet inspection device, or a BGP black-hole that drops packets before they ever arrive — is invisible to higher layers without dedicated TCP instrumentation.

TCP censorship is also the layer where two fundamentally different government blocking mechanisms produce symptoms that look identical to the application: a connection that simply doesn't work. RST injection and null-routing require different detection logic, map to different interference classes in the anomaly classifier, and correlate with different censorship regimes. Getting them right is the difference between attributing an incident to active TCP interference versus a BGP withdrawal.

Two mechanisms, two fingerprints

The GFW in China and RST-based censors in Russia use RST injection: a middlebox forges a TCP RST packet, using the target server's source IP, to kill the connection. The probe's kernel receives the RST and tears down the connection immediately. From the probe's perspective the server itself sent the RST — but the actual server never saw the connection at all.

Iran, Pakistan, and several Gulf-state censors more commonly use null-routing: the ISP or a transit carrier advertises a more-specific BGP prefix for the target IP and points it at a discard interface. Packets leave the probe, reach the ISP, and are silently dropped. No RST arrives. No ICMP unreachable. The SYN just disappears. The probe's connect() call eventually times out.

These two behaviors require separate detection paths. An RST that arrives 200ms after the SYN-ACK looks nothing like an RST that arrives 4ms after — yet both are reported as "connection reset by peer" to the application. Timeout with no RST could be null-routing or it could be an overloaded server or a routing flap. Every field in the TcpResult struct exists to answer one specific disambiguation question.

The TcpResult struct

pub struct TcpResult {
    pub connected: bool,
    pub connect_time_ms: Option<f32>,
    pub rst_received: bool,
    pub rst_timing_ms: Option<f32>,   // ms after SYN-ACK
    pub is_injected_rst: bool,         // rst_timing_ms < 15.0
    pub null_routed: bool,             // timeout with no RST, no ICMP
    pub icmp_unreachable: bool,
    pub control_connect_time_ms: Option<f32>,
    pub connect_time_delta_ms: Option<f32>,  // vs control
}

Each field captures a specific observable at the TCP layer:

  • connected / connect_time_ms. Whether the three-way handshake completed and how long it took. A successful connection does not mean no censorship — transparent proxies intercept the connection and establish their own socket, so connected can be true while the traffic is being inspected or redirected.
  • rst_received / rst_timing_ms. Whether a RST was observed and how many milliseconds elapsed between the SYN-ACK and the RST arrival. This timing is the primary input to RST classification. Natural RSTs from legitimate servers arrive after the server has had time to process some request content — almost always more than 80ms in practice.
  • is_injected_rst. Derived field: true whenrst_timing_ms < 15.0. The 15ms threshold is calibrated from empirical data and described in detail below.
  • null_routed. True when the SYN-ACK wait times out (5 second budget) with no RST and no ICMP. Set only when the control connection to the same IP succeeds in the same window — otherwise the timeout is attributed to packet loss or an unreachable server.
  • icmp_unreachable. Some routers send ICMP type 3 (host unreachable) or type 3 code 1 (network unreachable) when null-routing. This is a weaker signal than timeout: ICMP unreachable can also originate from NAT traversal failures or carrier-grade NAT artifacts. It's recorded separately and weighted lower in the classifier.
  • control_connect_time_ms / connect_time_delta_ms.The control server's TCP connect time to the same target, and the delta between probe and control. A positive delta indicates the probe takes longer — expected due to geography, but a delta above 200ms on an otherwise successful connection flags routing-level interference.

RST injection detection: the 15ms threshold

The core insight behind RST injection detection is that injected RSTs are impatient. A censor's DPI device has already matched the target IP or SNI against its blocklist before the SYN-ACK is transmitted. By the time the SYN-ACK reaches the probe and the probe's kernel acknowledges it, the DPI device has already sent the RST — without waiting for the probe to send any application data at all.

Legitimate server RSTs look nothing like this. A server sends a RST after receiving data it can't process: a malformed TLS ClientHello, a request to a closed port after an application restart, a rate-limit enforcement event. All of these require the server to first receive and process incoming data — which means the RST arrives after at least one application-layer round trip, adding 80–400ms of latency depending on geography.

The 15ms threshold is not arbitrary. It was derived from analysis of 1.2M confirmed censorship events (independently verified against OONI data from the same probes and time windows) and 800K clean measurements from the same probe fleet. The statistical result:

  • Less than 0.1% of legitimate server RSTs arrive in under 15ms after SYN-ACK.
  • More than 94% of confirmed GFW RSTs arrive in under 8ms.
  • The 1ms–8ms peak corresponds to the round-trip through the GFW device and back.
  • The 8ms–15ms range catches RST injection by slower DPI hardware (Russia's TSPU devices cluster here).

The classification function:

fn classify_rst_timing(rst_timing_ms: f32) -> RstClass {
    match rst_timing_ms {
        t if t < 15.0  => RstClass::Injected,
        t if t < 80.0  => RstClass::Ambiguous,   // rare, not counted as censorship
        _              => RstClass::ServerSide,
    }
}

The Ambiguous band (15ms–80ms) exists because some legitimately fast servers on low-latency links can send RSTs in this range after processing a malformed handshake. These are not classified as injected RSTs; they contribute a small weight to the classifier's uncertainty rather than directly triggering an interference flag.

Null-routing detection and disambiguation from packet loss

A null-routed SYN and a SYN to a genuinely unreachable server are indistinguishable at the probe alone: both result in a 5-second timeout with no response. The disambiguation comes from the control server comparison.

When a probe SYN times out, the measurement pipeline checks whether the control server successfully established a TCP connection to the same IP within the same 30-second window. If the control connects and the probe times out, the null-routing field is set to true. If the control also times out, the measurement is marked as a potential server outage or global routing issue — not censorship — and excluded from incident attribution until the pattern is corroborated across multiple probes in the same country.

Country-level patterns confirm this distinction. Iran's null-routing blocks show consistent timeout-without-RST signatures from probes within the country while probes outside Iran connect normally. China's GFW, by contrast, almost never null-routes — it injects RSTs, which is faster and easier to deploy at GFW scale. When an IP is both null-routed in Iran and RST-injected in China, these are recorded as separate interference events with separate mechanisms.

ICMP unreachable messages are captured but not treated as definitive null-routing evidence. Carrier-grade NAT deployments, particularly in Southeast Asia, generate ICMP type 3 code 13 (communication administratively prohibited) for unrelated reasons. The icmp_unreachable flag is recorded and fed to the classifier as a weak feature, requiring corroboration from the timeout signal before contributing to a bgp_withdrawal classification.

Dual-IP probing to locate the RST source

An RST that arrives within 15ms tells us a censor injected it, but not whether the censor is reacting to the destination IP or the TLS SNI. These require different remediation approaches and map to different policy mechanisms on the censor's side, so Voidly distinguishes them with a second probe.

When is_injected_rst is true, the probe immediately sends a second SYN — this time to a control IP (8.8.8.8) rather than the blocked target's IP, but with the blocked domain as the TLS SNI in the subsequent ClientHello. This is technically an invalid request to Google's DNS infrastructure, but it produces a valid censorship signal:

  • Second SYN also gets an RST within 15ms.The censor is watching TLS SNI, not destination IP. The block is SNI-triggered.rst_source = RstSource::SniBased. This is how China's GFW operates for most HTTPS targets.
  • Second SYN connects successfully. The RST was destination-IP-triggered — the censor maintains an IP blocklist and doesn't inspect SNI. rst_source = RstSource::IpBased. Russia's Roskomnadzor IP blocklist blocks work this way.
  • Second SYN is ambiguous or times out.rst_source = RstSource::Unknown. The additional probe failed to disambiguate, possibly because 8.8.8.8 itself is blocked in the target country or the censor reacted to the destination IP of the second probe.

The dual-IP probe adds approximately one additional TCP attempt on each RST detection, adding roughly 28ms to the measurement on same-region targets — acceptable given the attribution value.

Connect time delta as a routing interference signal

Not all routing-level interference kills the connection. Transparent proxy insertion — where a DPI device intercepts the TCP connection and establishes its own socket to the target server — leaves the connection alive but adds a round trip. Theconnect_time_delta_ms field captures this.

The delta is computed as probe_connect_ms - control_connect_time_ms. A positive delta is expected: the probe is inside the country being measured and the control server is outside it. The baseline delta for any probe-control pair is calibrated from clean measurements to the same IP over a 7-day rolling window.

A delta more than 200ms above the calibrated baseline on a connection that successfully completes is classified as ROUTE_ANOMALY. This is a weaker interference signal than an RST or timeout — transparent proxies can also appear on legitimate CDN networks during maintenance. TheROUTE_ANOMALY label feeds the throttling interference class in the classifier rather than the harder censorship classes, and requires corroboration from at least two other probes in the same ASN to contribute to an incident.

Mapping TCP evidence to interference classes

The TCP layer produces four classifier inputs that feed into the 47-feature vector alongside DNS and TLS features:

  • is_injected_rst = true is a high-weight feature for both the tls_interference and http_blockingclasses. Most RST injection occurs at the TLS SNI check stage — the RST arrives before any HTTP is sent — so both classes receive the signal. Therst_source value further refines which class loads more heavily.
  • null_routed = true contributes to thebgp_withdrawal class. This aligns with BGP outage signals from IODA and RouteViews: when a prefix disappears from BGP tables at the same time the probe reports null-routing, the two signals are merged into a single corroborated event.
  • connect_time_delta_ms > 200 contributes to the throttling class. Throttling is defined as routing-level interference that degrades rather than blocks — elevated RTT is one of its primary signals alongside TTFB z-scores from the HTTP layer.
  • icmp_unreachable = true contributes a small weight to bgp_withdrawal but is not sufficient on its own. It typically acts as a tie-breaker when null_routed is also true.

Performance on the probe client

TCP measurement is the fastest layer in the probe's stack. The SYN-ACK wait is wrapped in a tokio::time::timeout of 5 seconds. At p50, TCP connect completes in 28ms for same-region targets; at p99, cross-continental connections reach 320ms. RST classification adds under 1ms — it is a single floating-point comparison against the 15ms threshold. Dual-IP probing, triggered only on RST detection, adds one additional TCP attempt of approximately 28ms on same-region targets.

The 5-second timeout budget is the dominant cost when null-routing is encountered. For countries like Iran where null-routing is the primary mechanism, a significant fraction of probe measurements wait out the full 5 seconds before reportingnull_routed = true. The probe scheduler accounts for this by reducing the per-country test rate when null-routing prevalence exceeds 30% of measurements in a window, avoiding timeout-induced backpressure on the measurement queue.


For the control server comparison methodology that provides the baseline TCP connect times: The Voidly Control Server: How We Tell Censorship from a Bad Network →

For the full HTTP/HTTPS measurement lifecycle that builds on the TCP layer: How Voidly measures HTTP and HTTPS censorship: the full protocol lifecycle from DNS through TLS to body comparison →

For the TLS layer that sits above TCP and uses its connected socket: How Voidly measures TLS-layer censorship: handshake forensics, certificate chain validation, and MITM detection →

For the anomaly classifier that consumes the TCP features documented here: The Voidly anomaly classifier: five interference classes and why we optimize for recall →