Binary Frame Format

Every frame sent on /stream/ingest is a binary WebSocket message: a 12-byte header followed by a raw sample payload. Nothing more, nothing less.

Layout

offset  size  field          type      notes
──────  ────  ─────────────  ────────  ─────────────────────────────────
  0      1    slot_id        uint8     slot from the handshake manifest
  1      8    t0_epoch_ms    int64 BE  ms since epoch, timestamp of sample[0]
  9      2    sample_count   uint16 BE number of samples in payload
 11      1    flags          uint8     reserved, send 0
 12     N     payload        bytes     sample_count × bytes_per_sample × num_channels

The header integer fields (t0_epoch_ms, sample_count) are big-endian. Payload samples are little-endian; this is the native byte order on x86/ARM, so in practice you can memcpy a native int16_t[] straight into the payload.

Payload size = sample_count × bytes_per_sample × num_channels, where bytes_per_sample = ceil(bit_depth / 8) and num_channels comes from your device model's modality channel config.

A valid frame therefore has total size 12 + sample_count × bytes_per_sample × num_channels bytes.

`t0_epoch_ms` is the only temporal ground truth

The pipeline uses t0_epoch_ms for every time-dependent decision: windowing, cross-modal alignment, ordering of samples on a stream. It does not use WebSocket arrival time as a proxy, and it has no way to reconstruct per-sample timing from anything else in the frame.

Consequences for firmware:

Take t0 from a single monotonic hardware clock shared across all sensor slots you send. Per-sensor buffer clocks that drift independently will cause motion-rejected HR quality to degrade, and the server can't detect or fix this.
Mis-batching modalities (e.g. sending PPG every 1 s but ACC every 4 s) means the ACC reference is stale for most of each window, and the motion-artifact node will drop it rather than fuse mistimed data. Keep cadences symmetric across modalities you want fused.

See Streaming from Firmware → Sample pairing across modalities for the full best-practice tiers.

Example: 100 samples of 16-bit PPG

slot_id = 0 (PPG green slot from the handshake)
t0_epoch_ms = 1760000000000
sample_count = 100
flags = 0
payload = 200 bytes (100 samples × 2 bytes × 1 channel)

Total frame size: 212 bytes.

Example: 50 samples of 3-axis 16-bit accelerometer

slot_id = 1 (accelerometer slot)
sample_count = 50
payload = 300 bytes (50 samples × 2 bytes × 3 channels, interleaved [x, y, z, x, y, z, ...])

Total: 312 bytes.

Channel interleaving

Multi-channel payloads are sample-major, channel-minor:

sample 0: [ch0][ch1][ch2]   sample 1: [ch0][ch1][ch2]   sample 2: [ch0][ch1][ch2]   ...

So a 3-axis accelerometer at 50 Hz, int16 LE, sends 6 bytes per sample time: x_lo x_hi y_lo y_hi z_lo z_hi.

Sample encoding

Sample width is derived from your device model's bit_depth:

`bit_depth`	`bytes_per_sample`	Example int type
8	1	int8 / uint8
12	2 (padded)	int16 LE
16	2	int16 LE
24	3	int32 LE (padded)
32	4	int32 LE / float32 LE

Raeh doesn't reinterpret the sample values; the signal-processing code reads them as the type matching the modality's conventional format. PPG: signed int16 LE.

Encoding in Python

import struct

HEADER_FMT = ">BqHB"  # big-endian: uint8, int64, uint16, uint8

def encode_frame(slot_id: int, t0_ms: int, samples: list[int]) -> bytes:
    header = struct.pack(HEADER_FMT, slot_id, t0_ms, len(samples), 0)
    # Samples are LITTLE-endian int16 (header format above is BE for its fields)
    payload = b"".join(s.to_bytes(2, "little", signed=True) for s in samples)
    return header + payload

Encoding in C

#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>  // for htons / htonl

size_t encode_frame(uint8_t *out, uint8_t slot_id, int64_t t0_ms,
                    uint16_t sample_count, const int16_t *samples) {
    out[0] = slot_id;
    // int64 BE
    for (int i = 0; i < 8; i++) out[1 + i] = (t0_ms >> (56 - 8 * i)) & 0xFF;
    uint16_t sc = htons(sample_count);
    memcpy(out + 9, &sc, 2);
    out[11] = 0;  // flags
    // Samples are LITTLE-endian. On x86/ARM you can memcpy the native buffer directly.
    memcpy(out + 12, samples, sample_count * 2);
    return 12 + sample_count * 2;
}

Encoding in JavaScript (Node or browser)

function encodeFrame(slotId, t0Ms, samples) {
    const buf = new ArrayBuffer(12 + samples.length * 2);
    const view = new DataView(buf);
    view.setUint8(0, slotId);
    view.setBigInt64(1, BigInt(t0Ms), false);        // false = big-endian
    view.setUint16(9, samples.length, false);
    view.setUint8(11, 0);
    for (let i = 0; i < samples.length; i++) {
        view.setInt16(12 + i * 2, samples[i], true);  // true = little-endian (samples)
    }
    return new Uint8Array(buf);
}

Common encoding mistakes

Wrong endianness. The header is big-endian; the sample payload is little-endian. Mixing them up is the single most common mistake; the server will still accept the frame, but the decoded values will be garbage (a 16-bit value of 2048 = 0x0800 becomes 0x0008 = 8 if you swap).
Wrong sample_count. Must be the number of samples, not bytes. A 3-axis accelerometer frame with 50 samples has sample_count = 50, not 150.
Off-by-one in t0_epoch_ms. The timestamp is for sample[0], not for "now" or for the last sample.
Mixing channel order. Always sample-major: [x0,y0,z0, x1,y1,z1, ...], not [x0,x1,x2,..., y0,y1,y2,...].
Using microseconds or seconds. It's milliseconds since epoch.

On any of these, the server closes the WebSocket with code 4004 and a diagnostic reason string. Log it, fix, reconnect.

Binary Frame Format

On this page