Technical writing
Building a Distributed VPN with Intelligent Routing
When building Voidly's VPN service — one component of our Global Censorship Index platform — we needed routing that could handle aggressive state-level censorship. Traditional VPN architectures fail in these environments because they're trivial to detect and block. This is how we built something better.
The Problem with Traditional VPNs
Most VPNs use OpenVPN or IPsec. Both are trivially detectable via deep packet inspection (DPI). China's Great Firewall can identify OpenVPN traffic with 99%+ accuracy just by analyzing packet timing and size patterns. IPsec is even worse—it has protocol-specific headers that scream "I'm a VPN."
WireGuard is better. It's fast, modern, and has a smaller attack surface. But it still has a problem: static configurations. Every client connects to the same server IP. Block that IP, kill the VPN.
Our Architecture
We built three layers:
Layer 1: Entry Nodes (globally distributed) Layer 2: Routing Intelligence (ML-based path selection) Layer 3: Exit Nodes (dynamic, frequently rotated)
Entry nodes are WireGuard servers with domain fronting. Traffic looks like HTTPS to Google or Cloudflare. We rotate entry node IPs every 48 hours using automated DNS updates.
The routing layer is where it gets interesting. We train a lightweight neural network (TensorFlow Lite) on historical connection data:
- Latency between nodes
- Packet loss rates
- Known censorship events (IP blocks, DPI triggers)
- Time of day patterns
- Geographic factors
The model runs on-device and decides optimal routing in <50ms. If a path shows signs of throttling or inspection, it switches automatically.
Traffic Obfuscation
Even with domain fronting, traffic analysis can reveal VPN usage. We implement three obfuscation techniques:
1. Packet size randomization
Add random padding to make encrypted packets look like normal HTTPS traffic. We maintain a distribution that matches real web browsing.
2. Timing perturbation
Introduce random delays (5-50ms) to break timing analysis patterns. Imperceptible to users, breaks ML-based traffic classification.
3. Cover traffic
Generate fake traffic to real websites during idle periods. Makes it impossible to distinguish VPN traffic from regular browsing.
Implementation Details
Core stack:
- WireGuard (kernel module for performance) - Rust (control plane, handles routing decisions) - TensorFlow Lite (on-device ML inference) - Redis (distributed state management) - PostgreSQL (connection logs, analytics)
Clients are written in Go for cross-platform support. The entire client is <8MB and uses <50MB RAM during operation.
Performance
Current metrics across our distributed nodes:
- Average latency: 47ms (vs 120ms for traditional multi-hop)
- Throughput: 850 Mbps average (limited by WireGuard, not our routing)
- Route calculation: <50ms median
- Successful DPI evasion: 99.3% (tested in China, Iran, Russia)
Censorship Resistance
We've been running this in production for 8 months. During that time:
- 142 entry node IPs blocked by China's GFW
- Zero service interruptions (automatic failover worked)
- 23 DPI signature updates detected and bypassed
- 0 successful traffic analysis attacks (that we know of)
The key insight: censorship systems optimize for scale. They can't afford deep analysis of every connection. By making our traffic look like the top 1% of HTTPS traffic (Google, Cloudflare, AWS), we blend into noise they can't filter without breaking the internet.
The obfuscation layer is intentionally not published (reduces effectiveness if censors can study it directly). Reference implementations of the routing logic are available on request via the contact channel below.
If you're building similar systems, the key lessons:
- Static configurations are death sentences
- Blend into high-value traffic that can't be blocked
- Automate everything—manual failover is too slow
- ML routing beats static rules by ~40% in hostile networks
- Measure everything, optimize the hot path ruthlessly
Questions or feedback: contact@ai-analytics.org