Skip to content

RGMP v2 Protocol

RGMP v2 (Rokoko General Motion Protocol, Version 2) is a binary protocol designed for real-time streaming of motion capture, sensor, and system data between applications. While it can be used for streaming data across a network, the protocol is primarily designed for inter-application communication.

It operates over a reliable, stream-oriented transport layer (TCP), using length-prefixed framing to separate messages.

RGMP v2 supports simultaneous streaming from multiple devices and data types, with a flexible grouping mechanism that allows different update rates and logical organization of data streams. Multiple clients can connect to an RGMP server at the same time, each receiving data from all devices connected to the server.

All values are encoded in little-endian format.


The communication in RGMP v2 is structured as a sequence of frames, each consisting of a header followed by a payload. The frame contains the following:

FieldTypeSize (bytes)Description
msg_prefixuint324Frame type (e.g., 1 for stream definition)
msg_lenuint324Payload length (bytes)
payloadbyte[]msg_lenBinary data for the frame

The msg_prefix field in the header indicates the type of frame being sent. The following frame types are defined in RGMP v2:

msg_prefixFrame TypeDescription
1Stream DefinitionDescribes all data streams, their types, semantics, and relationships
2Data FrameContains values for one or more streams
3Device DisconnectSignals device disconnection

sequenceDiagram
  participant Client
  participant Server
  Client->>Server: Connect (TCP)
  Server-->>Client: Stream Definition Frame (Binary/JSON)
  
  loop Streaming
    Server-->>Client: Data Frame(s) (Binary)
  end
  
  opt Graceful Teardown
    Server-->>Client: Device Disconnect Frame (Binary)
  end
  Client-->>Server: Close Connection

Workflow Rules:

  • Initialization: Upon device connection, the server sends a Stream Definition Frame for each connected client.

  • Error Handling: RGMP v2 relies entirely on TCP for packet reliability. Any application-level parsing errors must result in the connection being safely terminated.

  • Termination: Sessions are gracefully closed via a Disconnect Frame (to signal device disconnection) or standard TCP socket closure (to signal session end).

The Stream Definition Frame is sent at the start of a session and its payload is a UTF-8 formatted JSON string. Each definition describes the data from a real or synthetic device (e.g., smartsuit, smartglove) and contains:

  • Protocol Version & Name: Identifies protocol version and implementation.
  • Device Information: Type (e.g., smartsuit, smartglove), address, hub ID, device ID, and debug info.
  • Static Data: An array of static values (e.g., fixed transforms, calibration, hardware info) that do not change during the session. Each entry describes the data type, semantics, reference frames, and value.
  • Stream Groups: Each group contains a set of streams, a name, and an expected rate (Hz).

See rgmp_v2_definition_frame.schema.json for the full JSON schema.

Note: For this payload, the msg_len field in the header specifies the number of bytes in the string and does not include any null terminator (\0). The null terminator is not transmitted or counted in the length.

FieldTypeDescription
protocol_namestringName of the protocol (e.g., “RGMP”)
protocol_versionstringProtocol version (e.g., “2.0.0”)
device_idnumberUnique identifier for the device/session (must fit in a 32-bit unsigned integer)
device_typestringType of the device
timestamp_epochstringThe epoch of the timestamps used in the data stream (e.g., “unix_epoch” or “device_boot”)
device_infoobjectDevice metadata (see below)
static_dataarrayList of static data entries (see below)
groupsarrayList of stream groups (see below)

The timestamp_epoch field specifies the reference point for all timestamps in the data stream. Common values include:

  • "unix_epoch": Timestamps are in microseconds since the Unix Epoch.
  • "device_boot": Timestamps are in microseconds since the device was powered on.

The device object is optional and can be used to provide metadata about the source of the data.

The static_data section allows the protocol to communicate fixed values such as sensor offsets, calibration matrices, or hardware metadata. Each entry includes the same semantic fields as a stream, plus a value field containing the static value. This enables clients to receive all necessary context for interpreting streamed data.

Groups are a core concept in RGMP v2. Each group defines a logical collection of streams that are transmitted together at a specific rate. This enables, for example, high-frequency raw sensor data to be sent in one group, while lower-frequency processed pose data is sent in another. Each group has:

  • name: A human-readable name for the group.
  • expected_rate_hz: The expected update rate for this group (in Hz). A value of zero indicates that the group is event-driven and does not have a fixed update rate.
  • streams: An array of stream definitions, each describing the data type, semantics, and reference frames for a single stream.

Groups allow the protocol to efficiently support mixed-rate data and logical separation of different data types (e.g., raw IMU, pose, diagnostics). Data streams should be grouped according to their natural update rates and semantic relationships, since it is not possible to partially emit a group frame. If a partial update is needed, the group should be split into multiple groups.

Stream, Group, and Static Data Entry Fields

Section titled “Stream, Group, and Static Data Entry Fields”

The following fields are used for both streams (within groups) and static_data entries. This unified format allows for consistent semantics and metadata across all types of data in the protocol.

FieldTypeDescription
data_typestringData type (e.g., “FLOAT[3]”, “FLOAT[7]”, “UINT32”)
measure_typestringSemantic meaning of the data (e.g., “TRANSFORM”, “STATUS_FLAGS”)
target_framestringName of the target object or sensor the data describes
reference_framestring(Optional) The coordinate frame the data is measured relative to. If omitted, defaults to target_frame itself. Both forms (omitted, or explicitly equal to target_frame) are accepted on the wire and treated as semantically identical for uniqueness and lookup.
custom_labelstringRequired when measure_type is CUSTOM; rejected on every other measure_type
bit_mappingobject(Optional) Mapping of flag bits to names (for flag types)
valuevaries(Only for static_data) The static value (array, object, scalar)

*Note: If reference_frame is omitted, the data is assumed to be intrinsic to the target_frame (e.g., raw unrotated IMU data or hardware status flags). Setting reference_frame explicitly equal to target_frame is allowed and means the same thing. *

Data TypeDescriptionBinary Encoding
INT3232-bit signed integerint32_t (two’s complement)
UINT3232-bit unsigned integeruint32_t
INT6464-bit signed integerint64_t (two’s complement)
UINT6464-bit unsigned integeruint64_t
FLOATSingle float valuefloat (IEEE 754 single-precision)
DOUBLEDouble float valuedouble (IEEE 754 double-precision)

Vector and matrix types:

Data TypeDescriptionBinary Encoding
TYPE[N]Array of N values of base type TYPEN values of base type TYPE
TYPE[N, M]N by M matrix of base type TYPEN*M values of base type TYPE, row-major order

The dimensions N and M must each be at least 1; TYPE[0], TYPE[0,M], and TYPE[N,0] are not valid data types.

Measure TypeDescriptionUnits
POSITION3D position vectormeters
ORIENTATION3D orientation (i.e., quaternion given in x,y,z,w order)unitless
TRANSFORMRigid body transform (pose), given as a position followed by a quaternion (x,y,z,w)position: meters, quaternion: unitless
ANGULAR_VELOCITYAngular velocity vectorradians/second
LINEAR_VELOCITYLinear velocitymeters/second
LINEAR_ACCELERATIONLinear accelerationmeters/second^2
PROPER_ACCELERATIONProper acceleration (includes gravity)meters/second^2
MAGNETIC_FIELDMagnetic field vectorGauss
STATUS_FLAGSSystem or device status flagsunitless
CUSTOMCustom semantic meaning defined by the userunitless

When defining a stream, target_frame and reference_frame can be any frame name, but the following standard identifiers are recommended where applicable:

  • LTP_NED: Local Tangent Plane (North, East, Down). Right-handed.
  • LTP_ENU: Local Tangent Plane (East, North, Up). Right-handed.

Frame names should be written in snake_case, with the exception of the standard frames above. For example, left_hand, right_hand, head, etc.

A rotation or transform defined with the target_frame and reference_frame describes how a vector can be transformed from the target frame to the reference frame. I.e. if is the position of object A in the target frame, and the stream defines a transform from target_frame to reference_frame, then the position of object A in the reference frame can be computed as the following matrix vector product:

where is represented as a homogeneous coordinate vector and is the homogeneous transformation which can be constructed from the position and orientation data in the data stream. For example, if the stream defines a TRANSFORM with a position and a quaternion , can be constructed as:

Where is the rotation matrix corresponding to the quaternion .

Within a single group, every stream must be uniquely identified. The uniqueness key depends on the measure_type:

  • Standard measure types (POSITION, ORIENTATION, TRANSFORM, ANGULAR_VELOCITY, LINEAR_VELOCITY, LINEAR_ACCELERATION, PROPER_ACCELERATION, MAGNETIC_FIELD, STATUS_FLAGS): keyed by (measure_type, target_frame, reference_frame). Two POSITION streams describing the same target_frame but expressed in different reference_frames are distinct streams.
  • CUSTOM: keyed by (measure_type, target_frame, reference_frame, custom_label).

Optional fields participate in the key as their presence: a stream with no reference_frame has a different key from one with reference_frame: "LTP_ENU". Lookup at the consumer is strict — passing no reference_frame will not match a stream that declared one.

Two streams in the same group sharing the same key are a protocol error.

The custom_label field is the disambiguator for streams whose measure_type is CUSTOM. Two CUSTOM streams in the same group with the same target_frame and reference_frame are distinguished by their custom_label.

Validity is bidirectional:

  • custom_label is required when measure_type is CUSTOM. A CUSTOM stream with no label is a protocol error.
  • custom_label is rejected on every other measure_type. Setting it on POSITION, STATUS_FLAGS, or any standard measure is a protocol error.

The label does not affect the binary encoding of the data and is purely informational — it surfaces only in the JSON Stream Definition Frame, where it tells consumers what the otherwise-opaque CUSTOM stream represents ("battery_pct", "emf_residual", etc.).

For streams with measure_type of STATUS_FLAGS, the bit_mapping field must be provided. This field is an object that maps each bit index (starting from 0) to a human-readable name for that flag. For example:

"bit_mapping": {
"0": "is_tracking",
"1": "is_calibrated",
"2": "has_error"
}

For entries in the static_data array, the value field must be provided and contain the static value corresponding to the defined data type. Array data must be provided as a flat JSON array, even for matrix types (e.g., a 3x3 matrix should be provided as an array of 9 values in row-major order).

The Data Frame is sent repeatedly during streaming and contains the data values for all streams in the stream group given by group_id, in the order defined in the stream definition frame.

Data Frame Header

FieldTypeSize (bytes)Description
device_idUINT324Device/session identifier
group_idUINT324Group index corresponding to the stream definition frame (starting from 0)
timestamp_usUINT648Capture timestamp in microseconds since the defined epoch
payloadbyte[]according to stream definitionBinary data for all streams in the group

Payload

FieldTypeDescription
Stream 1variesData for stream 1 (type and order defined by the stream definition)
Stream 2variesData for stream 2 (type and order defined by the stream definition)

All stream values for the specified group must be present in the frame, in the order defined in the stream definition. Each value is packed according to its data_type.

All fields within the Data Frame are tightly packed without implicit memory padding. Because the header is exactly 16 bytes and all defined payload types are 32-bit, every value in the payload is guaranteed to fall on a natural 4-byte alignment boundary.

The timestamp_us field must represent the Time of Validity (hardware capture time) of the measurements, not the time of network packet construction or transmission. The reference epoch of this timestamp is defined by the timestamp_epoch field in the Stream Definition Frame.

Note: Timestamps must be strictly monotonically increasing within a single session.

The Device Disconnect Frame is sent when a device is disconnected. It is a simple binary frame that signals the removal of a device from the session.

Device Disconnect Payload

FieldTypeSize (bytes)Description
device_iduint324Device/session identifier

Upon receiving this frame, the receiver should remove or mark the device as disconnected and clean up any associated resources. A new device with the same device_id may connect in the future, but it should be treated as a new session.