Technical writing
Swarm SDK v0.4: situational awareness, electronic warfare coordination, and adversarial resilience
v0.3 gave the Swarm SDK forward secrecy and sealed sender — every message encrypted under ML-KEM-768 + X25519, Double Ratchet ratcheting per session, and O(1) group encryption via Sender Keys so a 64-node mesh doesn't pay 64x key distribution overhead. That baseline is necessary but not sufficient for operational use. v0.4 addresses a harder problem: what does your mesh do when it's being actively jammed, when nodes go silent mid-mission, and when a rogue drone tries to inject false position data into the swarm's shared operational picture?
Four capability areas shipped in v0.4: the Situational Awareness API, the Electronic Warfare Coordination protocol, Adversarial Resilience features for degraded-channel operation, and the RF Fingerprinting subsystem. This post covers each in turn.
Situational Awareness API
The new SwarmSALayer sits above the existing encrypted mesh and maintains a shared operational picture across all nodes in the swarm. Every node broadcasts its own position, velocity, and heading at a configurable interval — 200 ms for tactical modes (fast-moving assets, close coordination), 1 s for loitering modes (persistent ISR, reduced RF footprint). The SA layer is a first-class concern, not a tacked-on telemetry channel: it has its own priority queue in the MAVLink v2 transport so SA updates are not head-of-line blocked by bulk telemetry.
Authenticity is enforced at the message level. Every SA update is signed with the originating node's current session key (Ed25519 over the serialized fields). A node that doesn't hold another node's private key cannot forge that node's position. The SA layer rejects unsigned or invalid-signature updates before they touch the shared operational picture. This matters more than it might seem: in a multi-platform mission where different nodes are operated by different organizations, you want cryptographic proof of origin, not trust based on message source address.
The full SA update struct:
#[derive(Debug, Clone, Serialize)]
pub struct SaUpdate {
pub node_id: NodeId,
pub timestamp_ns: u64,
pub position: GeoPosition, // lat/lon/alt in WGS84
pub velocity: Velocity3D, // m/s in NED frame
pub heading_deg: f32,
pub target_tracks: Vec<TargetTrack>,
pub signature: SessionSignature, // Ed25519 over the above fields
}The SA layer handles intermittent connectivity through dead-reckoning: if a node's last update is more than 500 ms old, the layer projects its likely position forward using the last known velocity vector and heading, flagging the projected entry as inferred. This keeps the operational picture useful during brief comms drops without falsely asserting position certainty. Dead-reckoned entries are marked with a confidence field that decays with time; entries older than 3 s without a fresh update trigger a node-silent alert.
Sensor fusion is built into the same update path. If a node has an optical or RF sensor locked on a non-cooperative target, it publishes a TargetTrack in its SA update. Other nodes with overlapping coverage zones — determined by comparing sensor footprint polygons in the operational picture — contribute their own angle-of-arrival or range estimates. The SA layer runs a weighted-least-squares fusion over the contributing observations and updates the track's position estimate. Fusion weights are proportional to each node's sensor quality descriptor, which operators configure at deployment time.
Electronic Warfare Coordination
Detection without coordination is only half the problem. If one node detects jamming on its active frequency band, every other node in the mesh needs to know — immediately, and in a tamper-evident way that prevents an adversary from fabricating false EW events to trigger unnecessary frequency hops.
v0.4 introduces the EwEvent message type. A node that detects interference on a frequency band (via carrier-to-noise drop, bit error rate spike, or explicit SDR interference classification) signs and broadcasts an EW event to the mesh. The event carries the affected frequency range (start/stop Hz), interference type (jamming, spoofing, or wideband noise), measured signal strength, and a duration estimate. Signature verification uses the same key store as the SA layer — a node must be a recognized mesh participant to issue EW events.
Anti-replay is handled with a monotonic sequence number per node. Each node tracks the last-seen EW sequence number for every peer; events with a repeated or retrograde sequence number are silently dropped. This closes the replay-attack vector where an adversary captures a legitimate EW event and re-broadcasts it later to trigger spurious frequency hops.
pub fn handle_ew_event(&mut self, event: EwEvent) -> Result<(), SwarmError> {
// Verify signature — reject if node key is unknown or revoked
event.verify(&self.key_store)?;
// Anti-replay check
if !self.ew_seq_tracker.is_new(event.node_id, event.seq) {
return Err(SwarmError::ReplayDetected);
}
// Check if our active channel overlaps the affected range
if self.radio.channel_overlaps(event.affected_range) {
let hop = self.hop_plan.next_channel(self.session_id)?;
self.radio.execute_hop(hop)?;
tracing::warn!("EW event from {} — hopped to channel {}", event.node_id, hop);
}
Ok(())
}Frequency hop plans are pre-negotiated at session establishment and encrypted under the session key material. Observing that a hop occurred — even from a position where you can see RF activity on the old channel go silent — does not reveal the new frequency. The plan is a deterministic sequence derived from the shared session secret, so any node that was present at session establishment can compute the next channel without a round-trip negotiation. Nodes that miss a hop event and fall out of sync execute a resync protocol that re-derives the current channel from session state.
Adversarial Resilience
Previous versions of the SDK assumed degraded-but-present connectivity: packet loss, high latency, node churn. v0.4 targets a harder threat model: adversarial degradation by an actor who knows they are attacking a Swarm SDK mesh specifically. That changes the design constraints significantly.
Traffic morphing. An adversary who can observe encrypted traffic and correlate packet sizes to drone types (hover vs. transit vs. approach — all produce different telemetry rates and payload sizes) can extract operationally significant information without breaking the encryption. We eliminate this vector by padding every encrypted payload to one of six fixed sizes before transmission: 512, 1024, 2048, 4096, 8192, or 16384 bytes. Payloads smaller than the target size are padded with random bytes; payloads larger than the smallest size that fits are promoted to the next tier. This erases packet-size correlation at the cost of some bandwidth overhead — a tradeoff we consider acceptable for adversarial environments.
Timing jitter. Fixed inter-packet intervals are a timing side-channel: a persistent observer can confirm which packets belong to the same flow, and correlate transmission timing across multiple nodes to infer coordination events. Inter-packet timing is now randomized within ±15% of the configured interval using a cryptographically seeded PRNG, making timing-correlation attacks significantly more expensive without breaking the protocol's real-time guarantees.
Degraded-channel mode. If a node hasn't received a heartbeat from at least N peers (configurable, default 2) in the last 3 seconds, it switches to bandwidth-conserved mode. SA update frequency drops to 1 s regardless of configured mode, non-critical telemetry is suspended, and the node prioritizes EW event propagation and command messages. The goal is to keep the mesh minimally functional — and able to recover — when connectivity is actively being suppressed.
Store-and-forward. Messages that cannot be delivered immediately (no route to peer, peer silent) are queued in an encrypted ring buffer with a configurable TTL. The default is 30 seconds for tactical messages, 5 seconds for SA updates (stale position is worse than no position). When connectivity recovers, the queue drains in priority order. The buffer itself is encrypted under the session key — a physical capture of the device does not expose queued messages without the session key material.
pub struct ResilienceConfig {
/// Traffic morphing: pad all payloads to nearest allowed size
pub morph_sizes: &'static [usize], // [512, 1024, 2048, 4096, 8192, 16384]
/// Timing jitter fraction (0.15 = ±15% of configured interval)
pub timing_jitter: f32,
/// Heartbeat timeout before entering degraded mode (seconds)
pub degraded_trigger_secs: f32,
/// TTL for store-and-forward queue (seconds)
pub sfq_ttl_secs: u32,
}We intentionally expose this as a flat config struct rather than encoding defaults in the type system, because the right values are mission-dependent. A mesh operating in a low-RF-threat environment might disable traffic morphing entirely to recover the bandwidth overhead. A mesh executing in a heavy-jamming environment might tighten the degraded-mode trigger to 1.5 s and extend the store-and-forward TTL to 60 s. Operators who want a safe default can use ResilienceConfig::tactical(), which applies the values shown above.
RF Fingerprinting and Tracking
The RF Fingerprinting subsystem adds a passive non-cooperative tracking capability. Nodes equipped with a software-defined radio can now run an IQ sample analysis pipeline that extracts RF fingerprints from received signals — modulation type, timing characteristics, carrier offset, and spectral shape. These features are compared against the on-device fingerprint database to classify the emitter.
The fingerprint database ships with signatures for common commercial drone radio systems: DJI OcuSync (OcuSync 2.0 and 3.0 variants), Skydio Link, and Autel SkyLink. This enables passive detection of non-cooperative platforms operating in the same airspace without any active interrogation or emission — the node is purely listening. We want to be explicit about what this capability does and does not do: it detects and classifies; the SDK provides no active jamming or interference capability, and we have no plans to add one.
The integration point for SDR hardware is a trait:
pub trait RfFingerprintSensor: Send + Sync {
/// Called with raw IQ samples at the configured rate.
/// Returns a fingerprint if a non-noise emitter is detected.
fn process_samples(&mut self, samples: &[Complex<f32>]) -> Option<RfFingerprint>;
}
// The SDK integrates fingerprints into the SA layer automatically:
// RfContact messages are merged into the SharedOperationalPicture
// alongside cooperative position broadcasts.Classified emitters are published to the mesh as RfContact objects and merged into the SharedOperationalPicture alongside cooperative SA updates. Nodes with overlapping sensor footprints contribute independent angle-of-arrival estimates; the SA layer runs the same weighted-least-squares fusion it uses for cooperative target tracks. In testing with two nodes separated by 400 m, triangulation error on a stationary DJI OcuSync 3.0 emitter was under 12 m CEP at 1 km range — well within operationally useful bounds.
Integrators who want to use their own SDR hardware implement the RfFingerprintSensor trait and register it with the SA layer. The trait is Send + Sync so it runs on a dedicated thread and doesn't block the mesh communication loop. We ship a reference implementation against the RTL-SDR for evaluation use; production integrations typically use hardware with better dynamic range and lower latency.
Test coverage
v0.4 ships with 465 total tests, up from 302 in v0.3 — 163 new tests across the four new capability areas.
New test categories and counts:
- SA update signing and replay detection — 31 tests. Covers valid updates, signature verification failures, duplicate timestamp rejection, and dead-reckoning accuracy under controlled delay injection.
- EW event handling and frequency hop logic — 44 tests. Covers event validation, anti-replay sequence tracking, hop plan execution, out-of-sync resync, and fabricated EW event rejection under various key states.
- Adversarial resilience — 52 tests. Covers traffic morphing correctness (all payloads pad to a valid size, no information leaked in size distribution), timing jitter bounds, degraded-mode trigger and recovery, and store-and-forward queue ordering and TTL enforcement.
- RF fingerprinting — 36 tests. Covers fingerprint extraction from recorded IQ captures, database lookup and classification accuracy,
RfContactmesh propagation, and triangulation convergence under simulated two-node geometry.
Two areas received additional rigor beyond unit tests:
Property-based testing (via proptest). We model the SA layer as a state machine and generate arbitrary sequences of node joins, node departures, update broadcasts, and comms drops. The invariant under test: no fabricated position ever appears in the operational picture, and no legitimate update is silently dropped. We also property-test EW event ordering under concurrent delivery — events from multiple nodes can arrive out of order and the anti-replay tracker must produce consistent state regardless of delivery order.
Formal verification. The EW anti-replay protocol is modeled in TLA+ and verified against the replay-detection invariant: for any sequence of events from any set of nodes, no event with a previously seen sequence number from a given node can transition the system to an “accepted” state. The model covers single-node and multi-node event sources and includes the edge cases around node key revocation and session rotation that are hardest to reason about informally.
For the underlying cryptographic design this builds on — ML-KEM-768 + X25519 hybrid key exchange, Double Ratchet, and Sender Keys over MAVLink v2: Post-quantum encrypted communications for autonomous drone swarms →
For a deep dive into the Double Ratchet implementation — the ML-KEM-768 encapsulation ratchet, header encryption, out-of-order key cache, and STM32H7 benchmarks: The Swarm SDK double ratchet: forward secrecy and post-compromise security in drone mesh networks →
For the Swarm SDK overview, integration guide, and hardware compatibility matrix: Swarm SDK →