Technical writing

Swarm SDK key rotation: automated cryptographic material refresh in field-deployed drone meshes

· 11 min read· AI Analytics
Swarm SDKSecurityCryptography

A drone flying a six-hour reconnaissance mission cannot radio home to refresh its cryptographic material. The Fleet CA is on the ground, the RF environment is contested, and the mesh must keep running regardless. The Swarm SDK handles this by running a fully autonomous key rotation scheduler on each node — rotating Signed Pre-Keys on a 7-day cycle, replenishing One-Time Pre-Key pools before they exhaust, and coordinating staggered rotation across the mesh so no two nodes rotate simultaneously.

Why rotation matters in a drone mesh

The X3DH key agreement protocol that the Swarm SDK uses for session establishment depends on three distinct categories of cryptographic material: the Identity Key (IK), the Signed Pre-Key (SPK), and the One-Time Pre-Key (OTP) pool. Each has a different lifetime, a different rotation trigger, and a different consequence if it expires or exhausts without being refreshed.

The X3DH specification recommends rotating Signed Pre-Keys every 7 days. An SPK is a medium-term Curve25519 key pair that is signed by the device's Identity Key and published in its PreKeyBundle. Any peer that wants to establish a new session uses the SPK in the X3DH computation. If a drone's SPK expires and has not been replaced, that drone can no longer be the recipient of a new session — other nodes in the mesh will reject its stale bundle.

One-Time Pre-Keys are consumed on each session establishment. They exist specifically to provide per-session forward secrecy at the X3DH layer: each new initiating peer picks one OTP from the bundle, uses it in the key derivation, and that OTP is then marked consumed and never reused. A drone with a depleted OTP pool still functions — X3DH falls back to SPK-only mode — but loses the per-session forward secrecy guarantee for new sessions. The OTP pool must be continuously replenished.

The Identity Key is the root of trust for a node's mesh identity. It changes rarely — on a 90-day cycle — and its rotation requires coordination with the Fleet CA because the new IK must be signed into a new mission certificate. A drone whose IK expires cannot join the mesh at all; no peer will accept its unsigned bundles.

MaterialRotation triggerGrace periodFailure mode if missed
Identity Key (IK)90 days7 daysCannot join mesh
Signed Pre-Key (SPK)7 days48 hoursCannot initiate new sessions
OTP bundle<20 remainingNoneReduced forward secrecy
Session ratchetEvery 100 messagesN/ARatchet continues

All of these rotation decisions must be made autonomously. There is no server to poll, no CA to consult in flight. The RotationScheduler runs as a periodic task on the node and acts on whatever material is locally available.

The RotationScheduler Rust struct

The scheduler is a single Rust struct that holds the relevant expiry instants and a reference to the BKPSRAM region where long-term key material is stored. Its tick method is called by the main scheduling loop once per minute and returns a bitset of actions that the caller should execute.

pub struct RotationScheduler {
    device_id: DeviceId,
    ik_expires_at: Instant,
    spk_expires_at: Instant,
    otp_count: AtomicU32,
    rotation_jitter_secs: u64,  // device_id.hash() % 86_400 for staggered mesh rotation
    bkpsram: &'static mut BkpSramRegion,
}

impl RotationScheduler {
    pub fn tick(&mut self, now: Instant) -> RotationActions {
        let mut actions = RotationActions::empty();
        if now >= self.spk_expires_at - Duration::from_secs(self.rotation_jitter_secs) {
            actions |= RotationActions::ROTATE_SPK;
        }
        if self.otp_count.load(Ordering::Relaxed) < 20 {
            actions |= RotationActions::REPLENISH_OTP;
        }
        if now >= self.ik_expires_at - Duration::from_days(7) {
            actions |= RotationActions::ROTATE_IK;
        }
        actions
    }
}

The rotation_jitter_secs field deserves attention. It is initialized once at provisioning time as device_id.hash_u64() % 86_400 — a deterministic offset in the range 0–86,399 seconds (0–24 hours). The SPK rotation check fires not at the exact expiry instant but that many seconds before it. This is the staggered mesh rotation mechanism: every drone in the fleet has a slightly different rotation offset derived from its unique device identifier.

The RotationActions type is a bitflags struct. The caller checks which actions are set and dispatches them — SPK rotation and OTP replenishment are independent operations that can be queued and executed between measurement windows without blocking the flight-critical control loop.

SPK rotation procedure

Signed Pre-Key rotation is a carefully sequenced eight-step procedure. Each step is designed so that a power loss or reset at any point leaves the node in a recoverable state — either the old SPK is still intact, or the new one is fully committed.

  1. Generate new SPK keypair from TRNG. The STM32H7's hardware TRNG produces 256 bits of entropy for the new X25519 private key. The corresponding public key is derived in software.
  2. Sign new SPK with IK. The SPK signature is computed as SPK_sig = Ed25519_sign(IK_private, SPK_public). The Identity Key private key is read from BKPSRAM offset 0, used for signing, and the in-memory copy is immediately zeroed after use — it is not kept in a register or stack variable between operations.
  3. Write new SPK to BKPSRAM at offset 64. The new SPK private key (32 bytes), public key (32 bytes), and Ed25519 signature (64 bytes) are written to BKPSRAM at offsets 64, 96, and 128 respectively. At this point both old and new SPK material coexist in BKPSRAM — the old SPK has not yet been zeroized.
  4. Verify BKPSRAM contents match in-memory copy. All three fields are read back from BKPSRAM and compared byte-for-byte against the in-memory values using a constant-time comparison. If the readback fails, the rotation aborts and the old SPK remains authoritative.
  5. Zeroize old SPK region with 0xFF pattern. The old SPK private key (32 bytes at offset 64, before the new write) is overwritten with 32 bytes of 0xFF. The private key is the only secret component; the old public key and signature are non-secret and do not require zeroization.
  6. Verify zeroization. The 32-byte region is read back and each byte is asserted to equal 0xFF. A mismatch triggers a security halt — the device refuses to continue operating with unverified key destruction.
  7. Announce new PreKeyBundle via GossipFlood. A GossipFlood message carrying the new PreKeyBundle is queued with TTL=7 (7 mesh hops). Peers receiving it update their local cache of this node's bundle and will use the new SPK for any subsequent session establishments.
  8. Update spk_expires_at. The scheduler'sspk_expires_at field is updated to now + 7 days, and the new rotation timestamp is written to BKPSRAM offset 192.

The choice of 0xFF for zeroization — rather than the more conventional0x00 — is deliberate. BKPSRAM on the STM32H7 initializes to 0xFF on power-on reset. Writing 0xFF to a region is therefore indistinguishable from an unwritten region in a forensic analysis of the raw memory. An adversary who recovers the device and reads BKPSRAM will see 0xFF at the old SPK location and cannot determine whether that region ever held a private key or was simply never written. Zeroizing with 0x00 would leave a visible pattern that marks the transition.

The full SPK rotation completes in 0.8 ms on the STM32H7 at 480 MHz: keypair generation takes 0.25 ms from the TRNG, BKPSRAM write and readback verification takes 0.3 ms, zeroization and verification takes 0.1 ms, and queuing the gossip announcement takes 0.15 ms. Both the write and the gossip steps are performed between measurement windows to avoid impacting latency on the flight control loop.

OTP replenishment

When the OTP count drops below 20, the scheduler sets the REPLENISH_OTP action flag. The replenishment task generates 50 new OTPs in batch — enough to cover the typical inter-replenishment interval with comfortable margin.

Each OTP is a Curve25519 keypair generated from the STM32H7's hardware TRNG. OTPs are stored in SRAM1, not BKPSRAM, because BKPSRAM has only 4 KB and that capacity is reserved for the Identity Key and current Signed Pre-Key. The OTP pool occupies 50 × 64 bytes = 3.2 KB in SRAM1, with a 2-byte index header tracking the next available slot:

// OTP pool layout in SRAM1
// Header: 2 bytes (u16 little-endian) — index of next unconsumed OTP
// Pool:   50 x 64 bytes — each entry is [private_key: [u8; 32], public_key: [u8; 32]]
#[repr(C)]
struct OtpPool {
    next_index: u16,
    _pad: [u8; 2],
    entries: [OtpEntry; 50],
}

#[repr(C)]
struct OtpEntry {
    private_key: [u8; 32],
    public_key:  [u8; 32],
}

// Consumption tracking — BitSet<256> supports up to 256 concurrent pre-keys
// 256 bits = 32 bytes, stored in SRAM1 adjacent to the OTP pool
struct OtpBitSet([u8; 32]);

impl OtpBitSet {
    fn mark_consumed(&mut self, index: u8) {
        self.0[(index / 8) as usize] |= 1 << (index % 8);
    }
    fn is_consumed(&self, index: u8) -> bool {
        self.0[(index / 8) as usize] & (1 << (index % 8)) != 0
    }
}

// Zeroize private key from SRAM1 immediately after consumption
fn consume_otp(pool: &mut OtpPool, bitset: &mut OtpBitSet, index: u8) -> [u8; 32] {
    let entry = &mut pool.entries[index as usize];
    let public_key = entry.public_key;
    // Volatile write ensures the compiler does not optimize away the zeroization
    for byte in entry.private_key.iter_mut() {
        unsafe { core::ptr::write_volatile(byte, 0xFF) };
    }
    bitset.mark_consumed(index);
    public_key  // only the public key is returned to the caller
}

The volatile_write pattern is critical. A standard memory write to zero a local variable will frequently be elided by the Rust compiler because the variable goes out of scope immediately after — the compiler can prove the write has no observable effect on the program's externally visible behavior. A volatile write cannot be elided; the compiler must emit the store instruction regardless of context. This is the standard pattern for cryptographic zeroization in no_std Rust and is equivalent to memset_explicit in C99 or SecureZeroMemoryon Windows.

The BitSet<256> consumption tracker supports up to 256 concurrent pre-keys in 32 bytes of storage. In practice the pool never exceeds 50 active entries, but the 256-entry capacity allows for future expansion without changing the on-wire protocol — OTP indices are transmitted as a single byte in PreKeyBundle messages.

Generating a batch of 50 OTPs takes 12.5 ms on the STM32H7 — 0.25 ms per keypair from the TRNG, the same rate as SPK generation. Like SPK rotation, OTP batch generation is performed between measurement windows.

Distributed rotation coordination — staggered mesh rotation

Consider what happens if all 12 drones in a mesh rotate their SPKs simultaneously. At the moment rotation fires, every node generates and commits a new SPK and floods a new PreKeyBundle. For the 2–3 seconds it takes that flood to propagate across all hops, every node in the mesh has stale cached bundles for every peer. During that window, no node can establish a new session with any peer — the cached SPK is no longer valid, and the new one has not yet arrived.

In practice, session establishment happens continuously in a live mesh: reconnaissance drones spin up data-relay sessions when they re-enter range, command-and-control channels are established when a new tasking arrives, and mesh healing creates new sessions when a relay path changes. A 2–3 second simultaneous blackout is operationally unacceptable.

The solution is the rotation jitter. Each drone's rotation_jitter_secs is computed once at provisioning as:

// Computed once at provisioning time; stored in BKPSRAM at offset 220 (hypothetical future field)
// device_id is a 128-bit UUID unique to each physical device
let jitter_secs: u64 = device_id.hash_u64() % 86_400;

// Effect: each drone rotates its SPK at a slightly different point within the 7-day window.
// With 12 drones and jitters uniformly distributed across 86,400 seconds (24 hours):
//   Expected spacing = 86,400 / 12 = 7,200 seconds = 2 hours between rotations
//   At most 1 drone rotates at any given moment during normal operations.

// The scheduler fires rotation when:
//   now >= spk_expires_at - jitter_secs
// Which means each drone rotates jitter_secs before the nominal 7-day expiry,
// effectively shifting each drone's rotation window by a unique offset in [0, 24h).

With 12 drones and jitters uniformly distributed across 86,400 seconds, the expected gap between consecutive rotations is 7,200 seconds — two hours. During that two-hour window, every other drone has a fresh and valid SPK cached for the rotating drone, because the rotating drone's new bundle propagated during a period when no other node was mid-rotation. Session establishment continues uninterrupted.

The jitter is deterministic but unpredictable to an outside observer — the hash of the device ID is not transmitted in any protocol message. An adversary monitoring bundle rotation events cannot easily map rotation times back to device identities or predict when a particular drone's SPK will next rotate.

Emergency key revocation

When a drone is captured or its key material is compromised — whether by a physical breach, a side-channel attack, or an operational error — the Fleet CA issues aKeyRevocationAnnouncement. This message is authoritative because it is signed by the Fleet CA's own Ed25519 key, which every node in the fleet has pre-loaded at provisioning time.

pub struct KeyRevocationAnnouncement {
    pub revoked_device_id: DeviceId,
    pub revoked_ik_fingerprint: [u8; 32],
    pub reason: RevocationReason,
    pub effective_at: u64,  // Unix timestamp
    pub ca_signature: [u8; 64],  // Fleet CA Ed25519 signature
    pub sequence_number: u64,    // Monotonic, prevents replay
}

The sequence_number field is monotonically increasing per-device. Each node maintains a local table of the highest sequence number it has seen for each device ID. A revocation announcement with a sequence number lower than or equal to the stored maximum is silently discarded — this prevents an adversary from replaying an old revocation message to disrupt a legitimate device that has since been re-provisioned with a new IK.

The revocation announcement is flooded through the gossip mesh with TTL=∞ — it is re-flooded on every subsequent mesh join, meaning a node that was offline when the revocation was issued will receive it as soon as it re-enters the mesh. Every peer receiving a valid announcement immediately takes three actions:

  • Drop all active sessions with the revoked device. All session state, chain keys, and cached message keys for that device ID are zeroized from SRAM and the session entries are removed.
  • Refuse new sessions from the revoked IK fingerprint. Any subsequent X3DH session establishment attempt presenting that fingerprint is rejected at the handshake layer before any key derivation takes place.
  • Persist the revocation in BKPSRAM. The revoked IK fingerprint is added to a local revocation cache stored in BKPSRAM and retained for 90 days. Across reboots, power cycles, and firmware updates, the revocation remains in force.

The 90-day retention period matches the maximum IK lifetime. After 90 days, a revoked device's IK fingerprint cannot appear in a valid new bundle regardless — the IK would have expired — so retaining the revocation entry beyond that point provides no additional protection and wastes the limited BKPSRAM capacity reserved for the revocation cache.

IK rotation — the 90-day cycle

Identity Key rotation is the most consequential operation in the key lifecycle. The IK is the root of the device's mesh identity: every SPK is signed by the IK, every mission certificate chains to the IK, and every peer has verified the IK fingerprint at session establishment. Rotating the IK is effectively changing who the drone is within the mesh.

The IK rotation procedure requires the Fleet CA and cannot be performed entirely autonomously. The scheduler raises the ROTATE_IK action flag 7 days before the IK's 90-day expiry — this is the advance warning window. The ground-side operator must complete the rotation before the expiry deadline. The sequence is:

  1. New IK generated on-device. The drone's TRNG generates a fresh Ed25519 keypair. The new IK private key is written to a staging region in BKPSRAM (offset 212 in the reserved block) — not yet the authoritative IK slot.
  2. CSR sent to GCS over secure channel. The new IK public key is wrapped in a Certificate Signing Request and transmitted to the Ground Control Station over the existing encrypted session, which still uses the old IK. The GCS relays the CSR to the Fleet CA.
  3. Fleet CA signs new IK with mission cert. The CA issues a new mission certificate binding the new IK public key to the device ID. The signed certificate is transmitted back to the drone over the GCS link.
  4. New signed bundle flooded through mesh. The drone floods a new PreKeyBundle containing the new IK fingerprint and a fresh SPK signed by the new IK. Peers that receive this bundle begin caching it for future sessions while continuing to accept the old IK for active sessions.
  5. 7-day overlap window. Both old and new IK bundles are simultaneously valid for 7 days. During this window, existing sessions established under the old IK continue to operate; new sessions prefer the new IK. This prevents any session disruption during the transition.
  6. Old IK zeroized from BKPSRAM. After the 7-day overlap window, the old IK private key (offset 0, 32 bytes) is overwritten with the 0xFF pattern and verified. The new IK is promoted from the staging region to the authoritative slot.

The 7-day overlap window is structurally equivalent to the grace period defined in the rotation urgency table: it ensures that no peer is left with a stale session derived from an IK that has already been destroyed. Active sessions continue without interruption; only new session establishments change over to the new IK.

BKPSRAM 4 KB layout

All long-term cryptographic material — Identity Key, Signed Pre-Key, and associated metadata — is stored in the STM32H7's Backup SRAM (BKPSRAM). This 4 KB region is battery-backed, survives system resets and low-power modes, and is inaccessible to the DMA controller. The layout is fixed at provisioning time and must not be changed across firmware updates:

OffsetSizeContents
032 BIK private key (Ed25519)
3232 BIK public key
6432 BCurrent SPK private key (X25519)
9632 BCurrent SPK public key
12864 BSPK Ed25519 signature
1928 BSPK rotation timestamp
2008 BIK rotation timestamp
2084 BOTP bundle pointer (index into SRAM1)
2123836 BReserved / future use

The layout places the IK private key at offset 0 so that a read of the first 32 bytes always retrieves the most sensitive material. The SPK private key at offset 64 sits immediately after the IK pair, placing all secret key material in the first 96 bytes — a single cache line on Cortex-M7 — which means the hardware can fetch the full secret material in a single memory transaction.

The 3,836 bytes of reserved space at offset 212 is available for the revocation cache (each 32-byte IK fingerprint plus an 8-byte expiry timestamp costs 40 bytes; 95 revocation entries fit in 3,800 bytes, leaving 36 bytes for a cache header). Future firmware versions may allocate this space for the IK staging region during rotation and for other per-device metadata without changing the layout of the first 212 bytes.

Performance summary

All rotation operations are designed to complete within a single 10 ms inter-message window on the STM32H7 at 480 MHz. The following benchmarks were measured on hardware with compiler optimizations enabled and hardware AES disabled (software AES-256-GCM):

Key rotation performance — STM32H7 (Cortex-M7, 480 MHz)

Operation                               Time
─────────────────────────────────────────────────────────────
SPK keypair generation (TRNG)          0.25 ms
BKPSRAM write + readback verify        0.30 ms
Zeroization + verify (32 bytes)        0.10 ms
GossipFlood announcement queued        0.15 ms
Total SPK rotation                     0.80 ms

OTP keypair generation (1 key)         0.25 ms
OTP batch generation (50 keys)        12.50 ms
OTP private key zeroize (volatile)     0.003 ms per key

RotationScheduler.tick()               0.003 ms (no actions)
                                       0.006 ms (with action check + bitflags OR)

SPK rotation at 0.8 ms fits comfortably within a 10 ms measurement window. OTP batch generation at 12.5 ms spans two measurement windows — in practice it is scheduled across a low-priority background task that yields after each keypair, so the flight-critical control loop is never blocked for the full 12.5 ms at once. The per-key cost is constant at 0.25 ms; generating 50 keys in 50 consecutive yields has negligible effect on overall system latency.

The RotationScheduler.tick() call itself is effectively free — three integer comparisons and three bitflags OR operations. Running it once per minute consumes less than 1 microsecond of CPU time per minute, or approximately 0.000002% of available compute. The scheduler is the least expensive component in the entire key management stack.


Related technical articles: