ZeroMQ-based Network Protocol¶
Defines the formats of output data transmitted using ZeroMQ.
Configuration Endpoint (REQ-REP)¶
Nimble exposes a “config” socket that will reply with JSON to allow consumers to query the application’s configuration and to discover the dynamic data endpoints.
ZeroMQ socket pattern: REP
Endpoint type: TCP (e.g.
tcp://localhost:22951
)Default port: 22951 (This can be changed in Nimble’s configuration section)
Consumers must use a REQ socket
and send a request using the UTF-8 encoded string "config"
to query for the configuration.
This endpoint will always remain up as long as Nimble is initialized and running.
Consumers may remain connected to the endpoint and send additional queries at any time.
Example JSON¶
Upon receiving the "config"
request string,
Nimble will respond with a UTF-8 encoded JSON configuration that adheres to the following format:
{
"version": "1.0.0",
"dataPort": 51983,
"streams": {
"0": {
"port": 49834
}
},
"channels": {
"0": {
"stream": "0"
},
"1": {
"stream": "0"
},
"2": {
"stream": null
}
},
"grid": [
["0", "2"]
]
}
Version property¶
A semantic versioning string for this JSON format.
Data port property¶
The port number of the Data Publishing Endpoint.
Streams property¶
A map/dictionary of video streams and the port number of their endpoints.
Channels property¶
A map/dictionary of analytics channels, and their source video streams.
If the stream
property is a string, then it acts as a Stream ID reference into the streams
property above.
If the stream
property is null, then a timestamp will NOT be sent in the
multipart Channel message.
Grid property¶
A 2-dimensional array of Channel IDs. Specifically, a list of rows.
Each string in the grid is a Channel ID reference into the channels
property above.
This property is optional and simply a recommendation from Nimble on
how to layout a video wall of the listed channels.
Consumers may ignore this property and layout channels however they choose.
TypeScript Definition¶
interface Config {
/**
* The version of the JSON format.
*/
version: string;
/**
* The ZeroMQ PUB-SUB data port.
*/
dataPort: number;
/**
* The fragmented MP4 video streams.
*/
streams: {
[streamID: string]: {
/**
* The TCP port which can be connected to to receive video data.
*/
port: number,
},
};
/**
* A list of channels outputted by the pipeline.
*/
channels: {
[channelID: string]: {
/**
* The source video stream.
*/
stream: string | null,
},
};
/**
* A optional channel grid layout. Each string is a channel ID.
*/
grid?: string[][];
}
Data Publishing Endpoint (PUB-SUB)¶
Nimble uses a data socket to continuously publish real-time video analytics results to subscribers.
ZeroMQ socket pattern: PUB
Endpoint type: TCP (e.g.
tcp://localhost:51983
)Port: dynamic (see the Data port property in the config JSON)
Consumers must use a SUB socket to receive video analytics results in the form of JSON metadata and internal performance data of Nimble.
Topics¶
There are three data categories: Channel, FPS, and Latency.
Nimble uses “topic prefixes” to specify which data category each sent message belongs to. See ZeroMQ PUB-SUB Topics for how ZeroMQ implements topics and multipart messages.
Multipart Message Formats¶
"channel/{id}" {metadata}
"channel/{id}" {metadata} {timestamp}
"channel/{id}" {metadata} {jpeg}
"channel/{id}" {metadata} {timestamp} {jpeg}
"fps" {fps}
"latency" {latency}
Channel topic¶
A Channel message is sent when Nimble finishes processing a new video frame.
First, the channel topic is appended using this string format: channel/{id}
{id}
is a placeholder for a unique channel ID.
Custom JSON Metadata¶
Next, custom JSON metadata is encoded as UTF-8 and appended to the multipart message. The format of this metadata is dependent on the type of the running Use Case and contains inferred analytics data extracted from the video content. This metadata can be used to render overlays on the source video stream.
Important: Any position or size in the metadata is normalized
from 0
to 1
relative to the dimensions of the video. 0,0
indicates top-left.
For example, if the video dimensions are 1920x1080
pixels, then 0.5,0.2
encodes the position 960,216
in pixels.
This allows the video to be transformed without requiring the metadata to be parsed and transformed too.
Media Presentation Timestamp (and/or JPEG image)¶
A Channel message can be composed in 4 different ways depending on the static configuration or dynamic runtime state of the channel:
"channel/{id}" {metadata}
"channel/{id}" {metadata} {timestamp}
"channel/{id}" {metadata} {jpeg}
"channel/{id}" {metadata} {timestamp} {jpeg}
If the Channel message only contains 2 parts, then the channel may have a static configuration to only output metadata.
If the Channel message contains exactly 3 parts, then either a media presentation timestamp or a JPEG-encoded image has been appended to the multipart message. If the channel is configured with a non-null Stream ID, then this third part will be a timestamp, otherwise it will be a JPEG-encoded image.
If the Channel message contains 4 parts, then the channel is configured to output both a media timestamp, and a JPEG-encoded image.
Consumers should NOT make assumptions about the format of messages received from the same channel - it is allowed to change at runtime. For example, a Use Case could choose to only append a JPEG-encoded image on a conditional trigger. In this situation, the Channel messages could switch between the 1st and 3rd format, or the 2nd and 4th format.
The media presentation timestamp identifies a processed frame from the input video source. It’s relative to the beginning of the video, starts at zero, and is measured in seconds. It’s encoded as a JSON number primitive in UTF-8 when it’s appended to the multipart message.
FPS and Latency topic¶
An FPS and Latency message is sent when Nimble computes a new estimate for these performance metrics.
Both the fps and latency topics use a simple name: fps
and latency
, respectively.
The fps and latency values are encoded as JSON number primitives in UTF-8 and appended to the multipart message. Latency values are measured in milliseconds.
Video Stream Endpoints (Plain TCP)¶
Nimble proxies each input video source (RTSP, YouTube, MP4 file, etc) into a fragmented MP4 byte stream. Fragmented MP4 is streamable MP4, so it can be played in real-time as soon as a keyframe is received.
Consumers must connect to the stream endpoints using a plain TCP socket. The only data sent over the TCP connection is the raw fragmented MP4 byte stream, there is no message framing or packetization.
Important: Nimble will close a video stream endpoint when its input video source reaches end-of-stream. If the configuration endpoint or data publishing endpoint is still up, then the input video source is most likely temporarily disconnected or restarting. Consumers should use this close event to reset their MP4 decoder and attempt to reconnect to the video stream endpoint.
Web-compatibility¶
Fragmented MP4 conforms to the ISO BMFF Byte Stream Format defined by the W3C, which is supported by Media Source Extensions and available to use through the MSE API.
The MSE API documentation is also available from MDN Web Docs: Media Source API