Binary Frame Format
The 12-byte header and sample payload layout for /stream/ingest.
Every frame sent on /stream/ingest is a binary WebSocket message: a 12-byte header followed by a raw sample payload. Nothing more, nothing less.
Layout
offset size field type notes
────── ──── ───────────── ──────── ─────────────────────────────────
0 1 slot_id uint8 slot from the handshake manifest
1 8 t0_epoch_ms int64 BE ms since epoch, timestamp of sample[0]
9 2 sample_count uint16 BE number of samples in payload
11 1 flags uint8 reserved, send 0
12 N payload bytes sample_count × bytes_per_sample × num_channelsThe header integer fields (t0_epoch_ms, sample_count) are big-endian. Payload samples are little-endian; this is the native byte order on x86/ARM, so in practice you can memcpy a native int16_t[] straight into the payload.
Payload size = sample_count × bytes_per_sample × num_channels, where bytes_per_sample = ceil(bit_depth / 8) and num_channels comes from your device model's modality channel config.
A valid frame therefore has total size 12 + sample_count × bytes_per_sample × num_channels bytes.
t0_epoch_ms is the only temporal ground truth
The pipeline uses t0_epoch_ms for every time-dependent decision: windowing, cross-modal alignment, ordering of samples on a stream. It does not use WebSocket arrival time as a proxy, and it has no way to reconstruct per-sample timing from anything else in the frame.
Consequences for firmware:
- Take
t0from a single monotonic hardware clock shared across all sensor slots you send. Per-sensor buffer clocks that drift independently will cause motion-rejected HR quality to degrade, and the server can't detect or fix this. - Mis-batching modalities (e.g. sending PPG every 1 s but ACC every 4 s) means the ACC reference is stale for most of each window, and the motion-artifact node will drop it rather than fuse mistimed data. Keep cadences symmetric across modalities you want fused.
See Streaming from Firmware → Sample pairing across modalities for the full best-practice tiers.
Example: 100 samples of 16-bit PPG
slot_id= 0 (PPG green slot from the handshake)t0_epoch_ms=1760000000000sample_count= 100flags= 0payload= 200 bytes (100 samples × 2 bytes × 1 channel)
Total frame size: 212 bytes.
Example: 50 samples of 3-axis 16-bit accelerometer
slot_id= 1 (accelerometer slot)sample_count= 50payload= 300 bytes (50 samples × 2 bytes × 3 channels, interleaved[x, y, z, x, y, z, ...])
Total: 312 bytes.
Channel interleaving
Multi-channel payloads are sample-major, channel-minor:
sample 0: [ch0][ch1][ch2] sample 1: [ch0][ch1][ch2] sample 2: [ch0][ch1][ch2] ...So a 3-axis accelerometer at 50 Hz, int16 LE, sends 6 bytes per sample time: x_lo x_hi y_lo y_hi z_lo z_hi.
Sample encoding
Sample width is derived from your device model's bit_depth:
bit_depth | bytes_per_sample | Example int type |
|---|---|---|
| 8 | 1 | int8 / uint8 |
| 12 | 2 (padded) | int16 LE |
| 16 | 2 | int16 LE |
| 24 | 3 | int32 LE (padded) |
| 32 | 4 | int32 LE / float32 LE |
Raeh doesn't reinterpret the sample values; the signal-processing code reads them as the type matching the modality's conventional format. PPG: signed int16 LE.
Encoding in Python
import struct
HEADER_FMT = ">BqHB" # big-endian: uint8, int64, uint16, uint8
def encode_frame(slot_id: int, t0_ms: int, samples: list[int]) -> bytes:
header = struct.pack(HEADER_FMT, slot_id, t0_ms, len(samples), 0)
# Samples are LITTLE-endian int16 (header format above is BE for its fields)
payload = b"".join(s.to_bytes(2, "little", signed=True) for s in samples)
return header + payloadEncoding in C
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h> // for htons / htonl
size_t encode_frame(uint8_t *out, uint8_t slot_id, int64_t t0_ms,
uint16_t sample_count, const int16_t *samples) {
out[0] = slot_id;
// int64 BE
for (int i = 0; i < 8; i++) out[1 + i] = (t0_ms >> (56 - 8 * i)) & 0xFF;
uint16_t sc = htons(sample_count);
memcpy(out + 9, &sc, 2);
out[11] = 0; // flags
// Samples are LITTLE-endian. On x86/ARM you can memcpy the native buffer directly.
memcpy(out + 12, samples, sample_count * 2);
return 12 + sample_count * 2;
}Encoding in JavaScript (Node or browser)
function encodeFrame(slotId, t0Ms, samples) {
const buf = new ArrayBuffer(12 + samples.length * 2);
const view = new DataView(buf);
view.setUint8(0, slotId);
view.setBigInt64(1, BigInt(t0Ms), false); // false = big-endian
view.setUint16(9, samples.length, false);
view.setUint8(11, 0);
for (let i = 0; i < samples.length; i++) {
view.setInt16(12 + i * 2, samples[i], true); // true = little-endian (samples)
}
return new Uint8Array(buf);
}Common encoding mistakes
- Wrong endianness. The header is big-endian; the sample payload is little-endian. Mixing them up is the single most common mistake; the server will still accept the frame, but the decoded values will be garbage (a 16-bit value of 2048 =
0x0800becomes0x0008= 8 if you swap). - Wrong
sample_count. Must be the number of samples, not bytes. A 3-axis accelerometer frame with 50 samples hassample_count = 50, not 150. - Off-by-one in
t0_epoch_ms. The timestamp is for sample[0], not for "now" or for the last sample. - Mixing channel order. Always sample-major:
[x0,y0,z0, x1,y1,z1, ...], not[x0,x1,x2,..., y0,y1,y2,...]. - Using microseconds or seconds. It's milliseconds since epoch.
On any of these, the server closes the WebSocket with code 4004 and a diagnostic reason string. Log it, fix, reconnect.