ZeroMQ-based Network Protocol

Defines the formats of output data transmitted using ZeroMQ.

Configuration Endpoint (REQ-REP)

Nimble exposes a “config” socket that will reply with JSON to allow consumers to query the application’s configuration and to discover the dynamic data endpoints.

Consumers must use a REQ socket and send a request using the UTF-8 encoded string "config" to query for the configuration. This endpoint will always remain up as long as Nimble is initialized and running. Consumers may remain connected to the endpoint and send additional queries at any time.

Example JSON

Upon receiving the "config" request string, Nimble will respond with a UTF-8 encoded JSON configuration that adheres to the following format:

{
  "version": "1.0.0",
  "dataPort": 51983,
  "streams": {
    "0": {
      "port": 49834
    }
  },
  "channels": {
    "0": {
      "stream": "0"
    },
    "1": {
      "stream": "0"
    },
    "2": {
      "stream": null
    }
  },
  "grid": [
    ["0", "2"]
  ]
}

Version property

A semantic versioning string for this JSON format.

Data port property

The port number of the Data Publishing Endpoint.

Streams property

A map/dictionary of video streams and the port number of their endpoints.

Channels property

A map/dictionary of analytics channels, and their source video streams. If the stream property is a string, then it acts as a Stream ID reference into the streams property above. If the stream property is null, then a timestamp will NOT be sent in the multipart Channel message.

Grid property

A 2-dimensional array of Channel IDs. Specifically, a list of rows. Each string in the grid is a Channel ID reference into the channels property above. This property is optional and simply a recommendation from Nimble on how to layout a video wall of the listed channels. Consumers may ignore this property and layout channels however they choose.

TypeScript Definition

interface Config {

    /**
     * The version of the JSON format.
     */
    version: string;

    /**
     * The ZeroMQ PUB-SUB data port.
     */
    dataPort: number;

    /**
     * The fragmented MP4 video streams.
     */
    streams: {
        [streamID: string]: {
            /**
             * The TCP port which can be connected to to receive video data.
             */
            port: number,
        },
    };

    /**
     * A list of channels outputted by the pipeline.
     */
    channels: {
        [channelID: string]: {
            /**
             * The source video stream.
             */
            stream: string | null,
        },
    };

    /**
     * A optional channel grid layout. Each string is a channel ID.
     */
    grid?: string[][];

}

Data Publishing Endpoint (PUB-SUB)

Nimble uses a data socket to continuously publish real-time video analytics results to subscribers.

  • ZeroMQ socket pattern: PUB

  • Endpoint type: TCP (e.g. tcp://localhost:51983)

  • Port: dynamic (see the Data port property in the config JSON)

Consumers must use a SUB socket to receive video analytics results in the form of JSON metadata and internal performance data of Nimble.

Topics

There are three data categories: Channel, FPS, and Latency.

Nimble uses “topic prefixes” to specify which data category each sent message belongs to. See ZeroMQ PUB-SUB Topics for how ZeroMQ implements topics and multipart messages.

Multipart Message Formats

"channel/{id}" {metadata}
"channel/{id}" {metadata} {timestamp}
"channel/{id}" {metadata} {jpeg}
"channel/{id}" {metadata} {timestamp} {jpeg}

"fps" {fps}

"latency" {latency}

Channel topic

A Channel message is sent when Nimble finishes processing a new video frame.

First, the channel topic is appended using this string format: channel/{id}

{id} is a placeholder for a unique channel ID.

Custom JSON Metadata

Next, custom JSON metadata is encoded as UTF-8 and appended to the multipart message. The format of this metadata is dependent on the type of the running Use Case and contains inferred analytics data extracted from the video content. This metadata can be used to render overlays on the source video stream.

Important: Any position or size in the metadata is normalized from 0 to 1 relative to the dimensions of the video. 0,0 indicates top-left. For example, if the video dimensions are 1920x1080 pixels, then 0.5,0.2 encodes the position 960,216 in pixels. This allows the video to be transformed without requiring the metadata to be parsed and transformed too.

Media Presentation Timestamp (and/or JPEG image)

A Channel message can be composed in 4 different ways depending on the static configuration or dynamic runtime state of the channel:

"channel/{id}" {metadata}
"channel/{id}" {metadata} {timestamp}
"channel/{id}" {metadata} {jpeg}
"channel/{id}" {metadata} {timestamp} {jpeg}

If the Channel message only contains 2 parts, then the channel may have a static configuration to only output metadata.

If the Channel message contains exactly 3 parts, then either a media presentation timestamp or a JPEG-encoded image has been appended to the multipart message. If the channel is configured with a non-null Stream ID, then this third part will be a timestamp, otherwise it will be a JPEG-encoded image.

If the Channel message contains 4 parts, then the channel is configured to output both a media timestamp, and a JPEG-encoded image.

Consumers should NOT make assumptions about the format of messages received from the same channel - it is allowed to change at runtime. For example, a Use Case could choose to only append a JPEG-encoded image on a conditional trigger. In this situation, the Channel messages could switch between the 1st and 3rd format, or the 2nd and 4th format.

The media presentation timestamp identifies a processed frame from the input video source. It’s relative to the beginning of the video, starts at zero, and is measured in seconds. It’s encoded as a JSON number primitive in UTF-8 when it’s appended to the multipart message.

FPS and Latency topic

An FPS and Latency message is sent when Nimble computes a new estimate for these performance metrics.

Both the fps and latency topics use a simple name: fps and latency, respectively.

The fps and latency values are encoded as JSON number primitives in UTF-8 and appended to the multipart message. Latency values are measured in milliseconds.

Video Stream Endpoints (Plain TCP)

Nimble proxies each input video source (RTSP, YouTube, MP4 file, etc) into a fragmented MP4 byte stream. Fragmented MP4 is streamable MP4, so it can be played in real-time as soon as a keyframe is received.

Consumers must connect to the stream endpoints using a plain TCP socket. The only data sent over the TCP connection is the raw fragmented MP4 byte stream, there is no message framing or packetization.

Important: Nimble will close a video stream endpoint when its input video source reaches end-of-stream. If the configuration endpoint or data publishing endpoint is still up, then the input video source is most likely temporarily disconnected or restarting. Consumers should use this close event to reset their MP4 decoder and attempt to reconnect to the video stream endpoint.

Web-compatibility

Fragmented MP4 conforms to the ISO BMFF Byte Stream Format defined by the W3C, which is supported by Media Source Extensions and available to use through the MSE API.

The MSE API documentation is also available from MDN Web Docs: Media Source API