Creating a custom GStreamer plugin for NVIDIA DeepStream

NVIDIA DeepStream provides an efficient framework for building production-ready multi-stream video analytics applications. By leveraging hardware-accelerated processing, developers can achieve extraordinary performance levels. However, there are scenarios where standard functionalities don’t meet the requirements, such as when using custom vision-technology/">language models, implementing unique post-processing tasks, or needing to swap models on the fly.

This article outlines how to create a custom GStreamer plugin in Python using the DeepStream SDK and pyservicemaker. We will explore the necessary steps to implement a custom inference process while maintaining high throughput in a DeepStream pipeline.

Understanding DeepStream metadata

Each buffer flowing through a DeepStream pipeline carries not just pixel data, but also rich metadata. Specifically, as frames pass through nvstreammux, each GstBuffer contains an attached NvDsBatchMeta structure. This metadata hierarchy, as described in NVIDIA’s documentation, consists of:

NvDsBatchMeta
    ├── NvDsUserMeta (batch-level custom metadata)
    └── NvDsFrameMeta (one per source stream)
        ├── NvDsUserMeta (frame-level custom metadata)
        └── NvDsObjectMeta (one per detected object)
            ├── NvDsClassifierMeta
            └── NvDsUserMeta (object-level custom metadata)

The architecture of DeepStream’s metadata is crucial for understanding how to enhance its capabilities with custom plugins. Each NvDsObjectMeta corresponds to an individual detection, allowing us to write our own detections directly into this structure, similar to how nvinfer operates. However, since NvDsObjectMeta instances cannot be created directly in Python, we must use batch_meta.acquire_object_meta() to obtain pre-allocated instances from the metadata pool maintained by NvDsBatchMeta.

The Python bridge: pyservicemaker

To work with DeepStream metadata in Python, the recommended approach is to utilize pyservicemaker. This SDK enables developers to create custom inference elements that can interact seamlessly with NVIDIA’s pipeline.

Using the BatchMetadataOperator, developers can extend the functionality of their plugins by creating a subclass that implements the handle_metadata(batch_meta) method. This access allows developers to manipulate the full NvDsBatchMeta for every buffer flowing through the pipeline.

One of the standout features of pyservicemaker is its Buffer wrapper around Gst.Buffer, which provides direct access to the batch_meta and a method to extract a DLPack handle to each frame’s GPU memory. This supports zero-copy inference, allowing frames to be handed directly to TensorRT without additional overhead.

Constructing the GStreamer plugin

When constructing a new GStreamer plugin, several key components must be in place. Unlike traditional plugin development, Python plugins require minimal setup.

The GStreamer system looks for plugins in the GST_PLUGIN_PATH. For Python plugins specifically, the python/ subdirectory under this path must contain the plugin file. Once the structure is established, GStreamer can discover the plugin without any compilation process.

A basic skeleton for an inference plugin should inherit from GstBase.BaseTransform. This permits the plugin to receive video buffers, run inference, and attach metadata while leaving the original buffer intact. Here’s a simplified structure:


class GStreamerInferenceElement(GstBase.BaseTransform):
    __gstmetadata__ = ("Custom Inference Plugin", "Transform", "A plugin for custom inference in DeepStream", "Author")
    __gsttemplates__ = (...)  # Input and output pad templates
def do_transform_ip(self, buf):
        # Inference logic goes here
        return buf  # Return the unmodified buffer

To ensure correct registration, the plugin must provide essential information such as pad templates and properties through __gproperties__ and appropriate GObject methods.

Verifying and inspecting the plugin

Once the plugin skeleton is constructed, it is crucial to verify its registration. This can be achieved with the command:


GST_PLUGIN_PATH=/path/to/your/plugins gst-inspect-1.0 your_plugin_name

If successful, outputs will display metadata, pad templates, and registered properties. This confirms that GStreamer recognizes the custom plugin and makes it available for use in pipelines.

To implement the inference logic, the complete code is available in a GitHub Gist. Inspect it, use it as a reference, or modify it for specific application needs.

Handling compatibility issues

While integrating the Ultralytics YOLO model with TensorRT, a compatibility issue may arise between TensorRT Python bindings and the GStreamer Python wrapper (PyGObject). To overcome this obstacle, developers can redefine the tuple object locally in the tensor backend, allowing the pipeline to function correctly without encountering runtime errors.

For inference logic that utilizes DLPack efficiently, embedding it into the do_transform_ip method is essential:


frames = []
for frame_meta in batch_meta.frame_items:
    t = torch.utils.dlpack.from_dlpack(buffer.extract(frame_meta.batch_id))
    frames.append(t)
batch = torch.stack(frames, dim=0)

This allows for seamless integration while maintaining the performance benefits expected from DeepStream.

Final thoughts and further exploration

After establishing the fundamentals for a custom GStreamer plugin, developers can freely experiment with different models and architectures. The processes discussed are applicable to various use cases, enabling seamless integration of custom inference stacks into DeepStream pipelines.

By varying models or structures, developers can explore additional capabilities like multi-stream setups or advanced video understanding techniques. The modular design encourages experimentation and adaptation, paving the way for innovative solutions in video analytics.

Full plugin code is hosted on GitHub, encouraging collaboration and the sharing of enhancements within the DeepStream community. With this knowledge, developers are equipped to build robust video analytics applications tailored to specific requirements while keeping runtime efficiency intact.