How to implement Video Crop Gstreamer plugin? (Caps negotiation)

10 min. read |

In this guide is being showed an implementation of video crop gstreamer plugin. Presented an approach to implement plugins with different input-output resolution.



Learn how to?


Most gstreamer plugins have different input and output formats. For example, format conversion from YUV to RGB with videoconvert. Or video resolution change with videoscale or videocrop. Often for Computer Vision tasks we want to do object detection for specific Region of Interest (ROI) only. This also helps to reduce memory consumption and computational costs.

In Gstreamer, plugins that have different input-output formats implemented using Caps Negotiation technique. Official documentation is quite difficult to understand as for me, so let’s do on example.

Note: Recall that caps (or capabilities) describes different media types. For example, to describe raw, Full HD video in RGB colorspace with 30 frames per second we can use:



Note: Tutorial is heavily based on audioplot (an official example from gst-python repository)

Define Caps

First, let’s define input and output capabilities (Gst.Caps). Our video crop plugin is going to support all RBG-based colorspaces and GRAY-scale:

FORMATS = [f.strip() for f in (RGB + GRAY).split(',')] 

The video format is going to be RAW: video/x-raw

Video resolution capabilities is going to be in range [1, MAXINT]:

range(1, GLib.MAXINT)

In order to feed these parameters as Gst-friendly format use Gst.ValueList and Gst.IntRange:

raw_video = 'video/x-raw'
formats = Gst.ValueList(FORMATS)
width = Gst.IntRange(range(1, GLib.MAXINT))
height = Gst.IntRange(range(1, GLib.MAXINT))

Now, let’s put everything together for input caps:

IN_CAPS = Gst.Caps(Gst.Structure(raw_video, format=formats, width=width, height=height)

And for output caps:

OUT_CAPS = Gst.Caps(Gst.Structure(raw_video, format=formats, width=width, height=height)

Note that input and output caps are equal for now. So the filter element built on those caps can handle different formats.

Implement plugin

We are going to extend GstBase.BaseTransform in order to build videocrop plugin. All steps are well-described here: How to write Gstreamer plugin with Python. Let’s start with class definition:

class GstVideoCrop(GstBase.BaseTransform):
  • name: gstvideocrop
  • description metadata
("Crop",  # Name
 "Filter/Effect/Video",  # Transform
 "Crops Video into user-defined region",  # Description
 "Taras Lishchenko <taras at lifestyletransfer dot com>") # Author
  • pads (inputs and outputs)."sink",

Note: Recall that plugins’s sink is input and src is ouput

Define the transform function. For videocrop we are going to use do_transform as we change the dimensions of output buffer. Let leave it empty for now.

def do_transform(self, inbuffer: Gst.Buffer, outbuffer: Gst.Buffer) -> Gst.FlowReturn:
    # empty
    return Gst.FlowReturn.OK

Put next code into the file end in order to register plugin and to use it from command line:

__gstelementfactory__ = ("gstvideocrop",
                         Gst.Rank.NONE, GstVideoCrop)

Define properties

Note: Properties are described in the following post “How to write Gstreamer plugin with Python” (Properties Section)

Properties are:

  • top
  • left
  • right
  • bottom

Each property interprets num of pixes to skip from specific image side. All properties are of type: int; in range: [-MAXINT, MAXINT]; and default value is 0.

Note: When property is negative the image is going to be padded with black pixels. When property exceeds the image limits the ValueError is going to be raised.

Note: Plugin is very similar to official videocrop plugin. To check it’s properties use:

gst-inspect-1.0 videocrop

Caps negotiation

Note: Caps negotiation for transform element is properly described here

Transform caps

Caps transform is used to find intersection (Gst.Caps.intersect) between input and output capabilities. Take a look on do_transform_caps overrided implementation:

def do_transform_caps(self, direction: Gst.PadDirection, caps: Gst.Caps, filter_: Gst.Caps) -> Gst.Caps:
    caps_ = IN_CAPS if direction == Gst.PadDirection.SRC else OUT_CAPS

    # intersect caps if there is transform
    if filter_:
        # create new caps that contains all formats that are common to both
        caps_ = caps_.intersect(filter_)

    return caps_

Note: Caps intersection explained in the following Colab Notebook


# IN Caps
video/x-raw, format=(string){ RGB, GRAY8 }, width=(int){ 320, 640, 1280, 1920 }, height=(int){ 240, 480, 720, 1080 }

# OUT Caps
video/x-raw, format=(string){ RGB }, width=(int){ 640, 1280, 1920 }, height=(int){ 480, 720, 1080 }

# Intersection
video/x-raw, format=(string)RGB, width=(int){ 640, 1280, 1920 }, height=(int){ 480, 720, 1080 }

Fixate Caps

Caps from previous step can store ranges. In the following example, format is represented by range of {RGB, GRAY8}, or height – {240, 480, 720, 1080}.

video/x-raw, format=(string){ RGB, GRAY8 }, width=(int){ 320, 640, 1280, 1920 }, height=(int){ 240, 480, 720, 1080 }

Caps are fixed when there is no properties with ranges or lists. For example, execute the following code. Pay attention to caps after fixate() call. There are no ranges:

caps_string = "video/x-raw, format=(string)RGB, width=(int){ 640, 1280, 1920 }, height=(int){ 480, 720, 1080 }"

caps = Gst.Caps.from_string(caps_string)
print(f"Before fixate: {caps}")
caps = caps.fixate()
print(f"After fixate: {caps}")

> Before fixate: video/x-raw, format=(string)RGB, width=(int){ 640, 1280, 1920 }, height=(int){ 480, 720, 1080 }
> After fixate: video/x-raw, format=(string)RGB, width=(int)640, height=(int)480

Note: Caps fixate explained in the following Colab Notebook

Now, override do_fixate_caps(). In the following function we are going to:

  • calculate output resolution based on properties: left, right, top, bottom
  • fixate caps, to store only output resolution (nearest)
def do_fixate_caps(self, direction: Gst.PadDirection, caps: Gst.Caps, othercaps: Gst.Caps) -> Gst.Caps:
        caps: initial caps
        othercaps: target caps
    if direction == Gst.PadDirection.SRC:
        return othercaps.fixate()
        # calculate the output width according to specified properties: top, left, bottom, right
        in_width, in_height = [caps.get_structure(0).get_value(v) for v in ['width', 'height']]

        if (self._left + self._right) > in_width:
            raise ValueError("Left and Right Bounds exceed Input Width")

        if (self._bottom + self._top) > in_height:
            raise ValueError("Top and Bottom Bounds exceed Input Height")

        width = in_width - self._left - self._right
        height = in_height - self._top - self._bottom

        new_format = othercaps.get_structure(0).copy()

        new_format.fixate_field_nearest_int("width", width)
        new_format.fixate_field_nearest_int("height", height)
        new_caps = Gst.Caps.new_empty()

        return new_caps.fixate()

Note: fixate_field_nearest_int() is used to fixate caps with the nearest possible value to our target output width and height

Example in the following Colab Notebook

Before fixate: video/x-raw, format=(string)RGB, width=(int){ 640, 1280, 1920 }, height=(int){ 480, 720, 1080 }

fixate_field_nearest_int('width', 1920)
fixate_field_nearest_int('height', 1080)

After fixate: video/x-raw, format=(string)RGB, width=(int)1920, height=(int)1080

Set Caps

And the easiest part is to override do_set_caps(). We use this only to set plugin into passthrough mode when input format is equal to the output.

def do_set_caps(self, incaps: Gst.Caps, outcaps: Gst.Caps) -> bool:
    Note: Use do_set_caps only to set plugin in passthrough mode

    in_w, in_h = [incaps.get_structure(0).get_value(v) for v in ['width', 'height']]
    out_w, out_h = [outcaps.get_structure(0).get_value(v) for v in ['width', 'height']]

    # if input_size == output_size set plugin to passthrough mode
    if in_h == out_h and in_w == out_w:

    return True

Video Crop

The last part is do_transform. As previously we successfully performed caps negotiation, now we can easily perform image crop.

First, let’s convert Gst.Buffer to np.ndarray for input and output data.

in_image = gst_buffer_with_caps_to_ndarray(inbuffer, self.sinkpad.get_current_caps())

out_image = gst_buffer_with_caps_to_ndarray(outbuffer, self.srcpad.get_current_caps())

Define margins to crop/expand for each side:

h, w = in_image.shape[:2]

left, top = max(self._left, 0), max(self._top, 0)
bottom = h - self._bottom
right = w - self._right

Crop image using array slice according to calculated margins previously:

crop = in_image[top:bottom, left:right]

Use cv2.copyMakeBorder to expand image with black pixels to meet output resolution. Finally, copy resulted pixels into output buffer, so it could be passed further in pipeline.

out_image[:] = cv2.copyMakeBorder(crop, top=abs(min(self._top, 0)),
                                        bottom=abs(min(self._bottom, 0)),
                                        left=abs(min(self._left, 0)),
                                        right=abs(min(self._right, 0)),


Now, let’s test everything we’ve implemented previously. First, clone repository and prepare the environment:

git clone
cd gst-python-plugins

python3 -m venv venv
source venv/bin/activate
pip install -U wheel pip setuptools

pip install -r requirements.txt

At the beginning let’s display simple pipeline using videotestsrc.

gst-launch-1.0 videotestsrc ! videoconvert ! xvimagesink

Before testing gstvideocrop plugin export the following variables:

export GST_PLUGIN_PATH=$GST_PLUGIN_PATH:$PWD/venv/lib/gstreamer-1.0/:$PWD/gst/

Run pipeline with the next command:

gst-launch-1.0 videotestsrc ! videoconvert ! gstvideocrop left=10 top=20 bottom=80 right=20 ! videoconvert ! xvimagesink

As you can see, now the initial image is cropped from sides.


With the following guide we learnt how to perform caps negotiations in order to implement gstreamer plugin with different input and output format (ex.: videocrop). The main steps to negotiate caps are:

  • define
  • set
  • transform
  • fixate

Hope this is going to be helpful for your development 🙂 As always, in case of any suggestions, – leave comments or contact me via social networks or email.


Add a Comment

Your email address will not be published. Required fields are marked *