How to implement Object Detection in Video with Gstreamer in Python using Tensorflow?

10 min. read |

In this tutorial we are going to implement Object Detection plugin for Gstreamer using pre-trained models from Tensorflow Models Zoo and inject it into Video Streaming Pipeline.

Requirements

Code

Learn how to?

  • Create gstreamer plugin that detects objects with tensorflow in each video frame using models from Tensorflow Models Zoo

Use gstreamer plugins

Preface

In previous posts we’ve already learnt How to create simple Gstreamer Plugin in Python. Now let’s make a step forward.

Guide

Preparation

At, first clone repository with prepared models, video and code, so we can work with code samples from the beginning.

git clone https://github.com/jackersson/gst-plugins-tf.git
cd gst-plugins-tf

Use virtual environment to make dependencies for project clear and at own place. So, activate new one:

python3 -m venv venv
source venv/bin/activate

Then, install requirements from the project.

pip install --upgrade wheel pip setuptools
pip install --upgrade --requirement requirements.txt

For model inference install Tensorflow. But check if your PC supports Cuda-enabled GPU first (otherwise install CPU version):

# For just CPU support
pip install tensorflow==1.15

# For both CPU | GPU support
pip install tensorflow-gpu==1.15

Additional. To make projects reproducible at any time I prefer to use Data Version Control for models, data or other huge files. As a storage service I use Google Cloud Storage (free, easy to use and setup).

export GOOGLE_APPLICATION_CREDENTIALS=$PWD/credentials/gs_viewer.json
dvc pull

Now check data/ folder there should be prepared model (.pb) and video (.mp4), so you can easily run tests on your own.

Define baseline gstreamer pipeline

Launch next pipeline in terminal to check that gstreamer works properly

gst-launch-1.0 filesrc location=data/videos/trailer.mp4 ! decodebin ! \
videoconvert ! video/x-raw,format=RGB ! videoconvert ! autovideosink

Basically, the following pipeline :

  • captures frames from video file usig filesrc,
  • converts frames to RGB colorspace using videoconvert and capsfilter with pre-defined image colorspace format by a string “video/x-raw,format=RGB“.
  • displays frames in window with autovideosink
How to implement Object Detection in Video with Gstreamer in Python using Tensorflow?

Run

Display mode

For now, let’s run a simple predefined command to check that everything working:

./run_example.sh
How to implement Object Detection in Video with Gstreamer in Python using Tensorflow?

Text mode

Export required paths to enable plugin and make it visible to gstreamer:

export GST_PLUGIN_PATH=$GST_PLUGIN_PATH:$PWD/venv/lib/gstreamer-1.0/:$PWD/gst/

Note

$PWD/venv/lib/gstreamer-1.0/ 
  • Path to libgstpython.cpython-36m-x86_64-linux-gnu.so (built from gst-python)
$PWD/gst/
  • Path to Gstreamer Plugins implementation (python scripts)

Run command with enabled debug messages print:

Note: check gstreamer debugging tools to enable logging

GST_DEBUG=python:5 \
gst-launch-1.0 filesrc location=data/videos/trailer.mp4 ! \ 
decodebin ! videoconvert !  \
gst_tf_detection config=data/tf_object_api_cfg.yml ! \
videoconvert ! fakesink
How to implement Object Detection in Video with Gstreamer in Python using Tensorflow?

Note: application should print similar output (list of dicts with object’s class_name, confidence, bounding_box)

[{'confidence': 0.6499642729759216, 'bounding_box': [402, 112, 300, 429], 'class_name': 'giraffe'}, \ 
{'confidence': 0.4659585952758789, 'bounding_box': [761, 544, 67, 79], 'class_name': 'person'}]

Great, now let’s go through code.

Explanation

From previous post (ex.: How to write a Gstreamer Plugin with Python) we discovered that from gstreamer plugin we can easily get image data. Now let’s try to run model on retrieved image data and display inference results in console or video.

At the beginning let implement Object Detection Plugin (gst_tf_detection).

Define plugin class

First, define Plugin class that extends GstBase.BaseTransform (base class for elements that process data). Plugin’s name “gst_tf_detection” (with this name plugin can be called inside gstreamer pipeline).

class GstTfDetectionPluginPy(GstBase.BaseTransform):
    GST_PLUGIN_NAME = 'gst_tf_detection'

Fixate stream format

First, define input and output buffer’s format for plugin. Since our models consumes RGB image data, let’s specify it:

_srctemplate = Gst.PadTemplate.new('src', Gst.PadDirection.SRC,                                   Gst.PadPresence.ALWAYS,                                  Gst.Caps.from_string("video/x-raw,format=RGB"))

_sinktemplate = Gst.PadTemplate.new('sink', Gst.PadDirection.SINK,                                    Gst.PadPresence.ALWAYS,                                         Gst.Caps.from_string("video/x-raw,format=RGB"))

Define properties

Additionally, let define next parameters to be able to pass:

  • model instance (with proper interface).
    • to be able to pass python object use GObject.TYPE_PYOBJECT as parameter type
  • model config, so we can easily modify model’s parameters without changing single line of code.
    • to be able to pass string use str as parameter type.
    • configuration file with parameters is common practice to setup plugins when the number of parameters exceeds 3 and more.
__gproperties__ = {
        "model": (GObject.TYPE_PYOBJECT,
                  "model",
                  "Contains Python Model Instance",
                  GObject.ParamFlags.READWRITE),

        "config": (str,
                   "Path to config file",
                   "Contains path to config file supported by Python Model Instance",
                   None,  # default
                   GObject.ParamFlags.READWRITE
                   ),
    }

Note: Passing model as a parameter also allows to save memory consumption. For example, if model is initialized each time plugin created then model’s weights are duplicated in the memory. In a such way the number of running pipelines simultaneously is limited by memory amount. But if model is created once and passes to each plugin as a reference, then the only limitation is hardware performance capacity.

Now, to specify model’s config use next command

gst_tf_detection config=tf_object_api_cfg.yml

Implement get-set handlers for defined properties

GET

def do_get_property(self, prop: GObject.GParamSpec):
    if prop.name == 'model':
        return self.model
    if prop.name == 'config':
        return self.config
    else:
        raise AttributeError('Unknown property %s' % prop.name)

SET

 def do_set_property(self, prop: GObject.GParamSpec, value):
        if prop.name == 'model':
            self._do_set_model(value)
        elif prop.name == "config":
            self._do_set_model(from_config_file(value))
            self.config = value
            Gst.info(f"Model's config updated from {self.config}")
        else:
            raise AttributeError('Unknown property %s' % prop.name)

When config for model is updated we need to shutdown previous one, initialize and start new one.

def _do_set_model(self, model):

    # stop previous instance
    if self.model:
        self.model.shutdown()
        self.model = None

    self.model = model

    # start new instance
    if self.model:
        self.model.startup()

Implement transform()

First, define a function to do buffer processing in-place do_transform_ip(), that accepts Gst.Buffer and returns state (Gst.FlowReturn).

def do_transform_ip(self, buffer: Gst.Buffer) -> Gst.FlowReturn:

Then, if there is no model plugin should work in passthrough mode.

if self.model is None:
    Gst.warning(f"No model speficied for {self}. Plugin working in passthrough mode")
    return Gst.FlowReturn.OK

Otherwise, we convert Gst.Buffer to np.ndarray, feed image to model (inference), print results to console and write objects to Gst.Buffer as metadata (recap: How to add metadata to gstreamer buffer), so detected objects can be transmitted further in pipeline.

import gstreamer.utils as utils
from gstreamer.gst_objects_info_meta import gst_meta_write

# Convert Gst.Buffer to np.ndarray
caps = self.sinkpad.get_current_caps()
image = utils.gst_buffer_with_caps_to_ndarray(buffer, caps)

# model inference
objects = self.model.process_single(image)

Gst.debug(f"Frame id ({buffer.pts // buffer.duration}). Detected {str(objects)}")
           
# write objects to as Gst.Buffer's metadata
gst_meta_write(buffer, objects)

Tensorflow Model Implementation

We won’t deep dive much into Tensorflow model implementation. Just have a look at code. Class TfObjectDetectionModel hides:

  • tf.Graph import
  • device configuration
  • model parameters (ex.: threshold, labels, input size)

Additional

Model Configuration file

  • file with common editable parameters for model inference. For example:
weights: "model.pb" 
width: 300
height: 300
threshold: 0.4 
device: "GPU|CPU" 
labels: "labels.yml" 
log_device_placement: false
per_process_gpu_memory_fraction: 0.0 

Labels format

  • file with lines of pairs <class_id: class_name>. For example:
1: person
2: bicycle 
3: car 
4: motorcycle 
... 
90: toothbrush

Object Detection Overlay Plugin

In order to draw detected objects on video there is an implementation of gst_detection_overlay plugin (recap: “How to draw kitten with Gstreamer“).

Main differences compared to gst_tf_detection plugin.

Input-output buffer’s format now RGBx (4-channels format), so we can work with buffer using cairo library.

Gst.Caps.from_string("video/x-raw,format={RGBx}")

Request detected objects info from buffer using gstreamer-python package

from gstreamer.gst_objects_info_meta import gst_meta_get

objects = gst_meta_get(buffer)

To enable drawing on buffer (in-place) use gstreamer-python package as well

from gstreamer import map_gst_buffer
import gstreamer.utils as utils

caps = self.sinkpad.get_current_caps()
width, height = utils.get_buffer_size_from_gst_caps(caps)

# Do drawing
with map_gst_buffer(buffer, Gst.MapFlags.READ | Gst.MapFlags.WRITE) as mapped:
    draw(mapped, width, height, objects)

Tuning

weights: "path/to/new/model/frozen_inference_graph.pb"
  • change video input
    • run whole pipeline on your video file, from camera or stream
  • change model’s config
    • reduce false positives with higher confidence threshold
threshold: 0.7
  • improve quality with increasing input size
width: 600
height: 600
  • leave target labels only
1: person
3: car

Conclusion

With Gstreamer Python Bindings you can inject any Tensorflow model in any video streaming pipeline. Custom plugins with Tensorflow models already used by popular video analytics frameworks.

Hope everything works as expected 😉 In case of troubles with running code leave comments or open an issue on Github.

26 Comments

Add a Comment

Your email address will not be published. Required fields are marked *