Technical writing
Swarm SDK mesh transport: reliable delivery over contested RF links
The Swarm SDK gossip mesh assumes it can send a frame to a peer and have it delivered. In practice, radio links between drones operating in contested RF environments have 5–30% packet loss under normal conditions, rising to 60%+ under active jamming. The transport layer's job is to provide reliable delivery guarantees to the gossip layer without requiring TCP's head-of-line blocking or three-way handshake overhead. Drones cannot afford the latency penalty of TCP's congestion control on low-latency command links.
Why not TCP?
TCP is the default answer to reliable delivery over lossy links, and it is the wrong answer for drone mesh networking. Three fundamental problems make TCP unsuitable.
Head-of-line blocking. A single lost packet stalls all subsequent packets until the lost one is retransmitted. For real-time telemetry, this means a dropped heartbeat blocks position updates that arrive after it — unacceptable latency on command links where staleness is safety-critical. A drone receiving stale position data from a swarm peer cannot make correct formation-keeping decisions.
Connection mobility. TCP connections are tied to a (source IP, source port, dest IP, dest port) 4-tuple. When a drone changes RF channel during frequency hopping or loses and regains link, the TCP connection must be torn down and re-established via a three-way handshake. For a swarm that changes links constantly — frequency hopping at 50–200 hops per second is common in contested environments — the TCP handshake overhead is prohibitive.
Embedded resource constraints. TCP requires per-connection state — congestion window, receive buffer, retransmit queue — that exceeds the available RAM on STM32H7 targets for large swarm sizes. A 64-drone swarm would require 63 concurrent TCP connections per node; at typical kernel TCP buffer sizes of 4–32 KB per connection, this exhausts the H7's 1 MB of DTCM RAM before any application state is allocated.
The MeshTransport layer
The Swarm SDK implements a custom transport called MeshTransport. It runs over UDP, maintains per-peer send and receive windows, and provides reliable ordered delivery without connection state tied to IP addresses. Connections survive IP changes because peer identity is tracked by PeerId (a 16-byte node identifier), not by socket address.
pub struct MeshTransport {
udp_socket: Arc<UdpSocket>,
peer_states: DashMap<PeerId, PeerTransportState>,
inbound_queue: mpsc::Sender<InboundFrame>,
outbound_queue: mpsc::Receiver<OutboundFrame>,
retransmit_queue: BTreeMap<RetransmitKey, PendingFrame>,
rng: SmallRng,
}
pub struct PeerTransportState {
pub send_window_base: u32, // oldest unacknowledged sequence number
pub send_window_next: u32, // next sequence number to assign
pub recv_window_base: u32, // next expected inbound sequence number
pub reorder_buffer: BTreeMap<u32, ReceivedFrame>, // out-of-order frames
pub last_ack_time: Instant,
pub rtt_estimate_ms: u32, // EWMA RTT estimate
}The DashMap over PeerId allows concurrent access from the send and receive tasks without a global lock. Each peer's PeerTransportState is independently managed, so a degraded link to one peer does not affect the transport state for other peers in the swarm.
The retransmit queue is a BTreeMap keyed by RetransmitKey (peer_id + sequence number), ordered by next retransmit time. The transport poll loop walks the BTreeMap in order and fires retransmits when their deadlines arrive, avoiding a separate timer per pending frame.
Frame types and wire format
MeshTransport defines four frame types, encoded in a single leading byte:
DATA (0x01)— carries payload data; includes sequence number and payload bytes.ACK (0x02)— cumulative acknowledgement up toack_seq, plus a selective ACK bitmap for received-ahead frames.NACK (0x03)— requests specific retransmit of missing sequence numbers.HEARTBEAT (0x04)— keepalive when no data is flowing, used to measure RTT and maintain peer liveness.
The wire format for a DATA frame:
| frame_type (1B) | peer_id (16B) | seq_num (4B) | frag_total (1B) | frag_index (1B) | payload_len (2B) | payload (N bytes) |
Total header overhead is 25 bytes. With a 253-byte MAVLink v2 payload limit (the TUNNEL message type carries a 253-byte data field), each DATA frame leaves 228 bytes for application payload. Larger messages are handled by the fragmentation mechanism described below.
Sliding window ARQ
The transport uses a Go-Back-N variant with selective ACK (SACK) extension. The SACK extension is critical for contested RF: under 20% packet loss, pure Go-Back-N would retransmit far more frames than necessary because it cannot distinguish which frames in the window were actually lost from which were merely reordered.
The key parameters:
- Window size: 8 frames by default, configurable up to 16. Larger windows improve throughput on high-latency links but increase retransmit cost under loss.
- Retransmit timeout: starts at
max(50ms, 2 × rtt_estimate), doubles on each retransmit (exponential backoff). - Maximum retransmits: 3 per frame before the peer is marked unreachable for that frame.
- SACK bitmap: the ACK frame carries a 16-bit bitmap indicating which of the next 16 frames beyond the cumulative ACK boundary have already been received out-of-order. This allows the sender to skip retransmit of any frame whose bit is set.
fn process_retransmit_queue(&mut self, now: Instant) {
let mut to_retransmit = Vec::new();
for (key, frame) in &mut self.retransmit_queue {
if now >= frame.next_retransmit_at {
if frame.retransmit_count >= MAX_RETRANSMITS {
// Mark peer unreachable
self.peer_states.get_mut(&key.peer_id)
.map(|mut s| s.mark_unreachable(key.seq_num));
} else {
frame.retransmit_count += 1;
frame.next_retransmit_at = now + frame.retransmit_timeout();
to_retransmit.push(frame.clone());
}
}
}
for frame in to_retransmit {
self.send_raw_frame(&frame);
}
}The retransmit loop walks the BTreeMap in scheduled-time order. Frames that exceed MAX_RETRANSMITS (3) are not retransmitted again; instead, mark_unreachable records the highest undeliverable sequence number so the gossip layer can detect link failure and re-route through a different peer.
RTT estimation
Accurate RTT estimation is essential for setting the retransmit timeout correctly. Too short a timeout causes spurious retransmits that consume radio bandwidth; too long a timeout causes unnecessary delivery delay under loss. The transport uses EWMA RTT estimation following RFC 6298 (the same approach as TCP):
fn update_rtt(&mut self, peer_id: &PeerId, measured_rtt_ms: u32) {
let state = self.peer_states.get_mut(peer_id).unwrap();
// EWMA with alpha=0.125 (same as TCP)
state.rtt_estimate_ms = (7 * state.rtt_estimate_ms + measured_rtt_ms) / 8;
// Minimum: 20ms (radio propagation floor); Maximum: 500ms (degraded link)
state.rtt_estimate_ms = state.rtt_estimate_ms.clamp(20, 500);
}RTT samples are collected from ACK frames: each ACK includes a timestamp echo of the DATA frame's send time, allowing the sender to compute round-trip time on receipt of the ACK. HEARTBEAT frames are also ACKed and contribute RTT samples when no data is flowing.
The 20 ms floor reflects real radio propagation characteristics: even at close range, the MAC layer scheduling, radio frequency switching, and hardware interrupt latency on STM32H7 targets impose at least 15–20 ms of irreducible round-trip time. The 500 ms ceiling prevents the retransmit timeout from growing too large during severely degraded link conditions where RTT samples are noisy — on a link with 60% packet loss, measured RTTs are dominated by the retransmit penalty rather than true propagation delay.
Fragmentation and reassembly
Gossip messages frequently exceed the 228-byte per-frame payload limit. A DeviceCertificate announcement is 340 bytes; a SealedSenderEnvelope carrying an encrypted group message can reach 1,136 bytes. The transport transparently fragments large payloads and reassembles them on the receiving end.
The frag_total and frag_index fields in the DATA frame header carry fragmentation metadata. A single-fragment message sets frag_total = 1 and frag_index = 0. The reassembly buffer is keyed by (peer_id, frag_group_id), where frag_group_id is derived from the first fragment's sequence number.
Fragmentation characteristics by message type:
Message type Size Fragments ───────────────────────────────────────────────── Gossip heartbeat 32 B 1 SenderKeyMessage 180 B 1 DeviceCertificate announce 340 B 2 SealedSenderEnvelope 1,136 B 5–6
Reassembly has a 200 ms timeout: if all fragments of a message do not arrive within 200 ms of the first fragment, the partial message is discarded and the sender is NACKed. The 200 ms window is generous relative to end-to-end frame latency (p99 22 ms for a single hop) but short enough that a stalled reassembly does not hold buffer space indefinitely. On heavily congested links where five-fragment messages are retransmitting, 200 ms covers up to three full retransmit cycles at the minimum 50 ms timeout.
Out-of-order delivery
Radio links do not guarantee in-order delivery. A frame transmitted on one frequency can arrive before a frame transmitted slightly earlier on a different frequency. The transport maintains a reorder buffer per peer of up to 16 out-of-order frames.
If frame N+2 arrives before N+1, N+2 is held in the reorder buffer. When N+1 arrives, both N+1 and N+2 are delivered together to the gossip layer in sequence order. The gossip layer receives an in-order stream and never sees gaps during normal operation.
The reorder window timeout matches the reassembly timeout: 200 ms. If the gap is not filled within 200 ms, the buffered subsequent frames are delivered with a gap marker so the gossip layer can request retransmit at its layer. This two-level recovery — transport-layer retransmit as the primary mechanism, gossip-layer anti-entropy as the fallback — means message loss requires both the transport retransmit (up to 3 attempts) and the anti-entropy reconciliation to fail, which is an extremely unlikely compound event under all but the most severe RF denial.
Multi-channel bonding
On platforms with dual-band radios — 2.4 GHz and 5.8 GHz — the transport can bond two links to increase resilience. Contested RF environments frequently target a single frequency band; a jammer that suppresses 5.8 GHz operation rarely simultaneously suppresses 2.4 GHz with the same effectiveness due to the different power levels required.
The bond configuration:
- Primary link: operational mesh frequency, typically 5.8 GHz (higher data rate, shorter range).
- Secondary link: backup frequency, 2.4 GHz (lower data rate, longer range — useful for swarms operating over extended terrain).
- Active-active mode: high-priority frames (command, revocation, key management) are duplicated on both links simultaneously. The receiver deduplicates on the
received_seq_setper peer. - Active-standby mode: all traffic flows on the primary link; the secondary is promoted automatically on primary link failure (detected by three consecutive missed HEARTBEATs at 100 ms interval).
Frame deduplication on receive is critical for active-active mode. Each PeerTransportState maintains a received_seq_set — a sliding window bitset of recently processed sequence numbers. A frame arriving on the secondary link with a sequence number already present in the set is silently dropped. The gossip layer never sees the duplicate.
Benchmarks
All benchmarks collected on STM32H7 at 480 MHz and Jetson Nano (ARM Cortex-A57 at 1.43 GHz), with the full cryptographic stack active. Numbers represent transport layer operations in isolation; end-to-end latency includes radio hardware latency on top.
MeshTransport benchmarks Platform: STM32H7 (Cortex-M7, 480 MHz) / Jetson Nano (ARM Cortex-A57, 1.43 GHz) Operation STM32H7 p50 Jetson Nano p50 ────────────────────────────────────────────────────────────────────── Frame encode + send 0.4 ms 0.06 ms SACK bitmap processing 0.1 ms — Reassembly (5-fragment message) 1.8 ms 0.22 ms End-to-end latency (1 hop, no loss) p50 8 ms p99 22 ms Throughput at 10% packet loss Effective 340 Kbps Raw (before retransmits) 850 Kbps
Frame encode at 0.4 ms p50 on the H7 includes header serialization, sequence number assignment, and the UDP send syscall. The Jetson Nano is 6.5× faster due to its out-of-order A57 cores and significantly higher memory bandwidth.
Five-fragment reassembly at 1.8 ms p50 on the H7 covers the full path: five frame receives, buffer insertions, completeness check, and payload concatenation. This is the worst-case common operation — the 1,136-byte SealedSenderEnvelope — and completes well within the 200 ms reassembly timeout.
The throughput gap between raw (850 Kbps) and effective (340 Kbps) at 10% packet loss reflects the retransmit overhead of the ARQ mechanism. Each lost frame costs one retransmit timeout (minimum 50 ms) plus the retransmit frame itself. At 10% loss with a window of 8 and RTT of 8 ms, the ARQ layer achieves roughly 40% of theoretical throughput — consistent with Go-Back-N performance at this loss rate. The SACK extension recovers some of this gap relative to pure Go-Back-N by avoiding retransmit of frames that were received out-of-order.
For the gossip mesh protocol that runs on top of this transport — epidemic broadcast, VecDeque deduplication, and anti-entropy reconciliation: Swarm SDK gossip mesh: bounded fanout routing, message deduplication, and network partition handling →
For how the transport-layer frames are wrapped in MAVLink v2 SWARM_MESH_FRAME messages for PX4 and ArduPilot integration: Swarm SDK MAVLink v2 integration: encrypting mesh messages inside 253-byte drone protocol frames →
For the Double Ratchet forward secrecy that operates above the transport layer — how session keys are ratcheted for every message: The Swarm SDK double ratchet: forward secrecy and post-compromise security in drone mesh networks →
For the X3DH session establishment that creates the initial shared secret handed off to the Double Ratchet: Swarm SDK session establishment: X3DH prekey bundles and the initial drone-to-drone handshake →