Deep Learning Video Analytics Frameworks based on Gstreamer
3 min. read |
Video analytics applications (ex.: smart cities, retail, industries, etc.) consist of two main parts: Video Streaming and Computer Vision/ Deep Learning Frameworks. In here we’ll go through available frameworks that allow developers to focus on providing analytics part and hide nuances of video streaming.
Overview
General architecture of Video Analytics applications looks like the following.
Source: http://on-demand.gputechconf.com/gtc-cn/2018/pdf/CH8307.pdf
What stands behind each part of previous architecture?
Process | Variations |
Collect | Web/IP Camera HTTP/RTP/RTSP Streaming Video Files (single, multiple) Image Files (single, multiple) |
Decode/Encode | Video compression formats: MJPEG/H264/H265/… Video containers: MPEG-4/AVI/MOV/… Colorspaces: RGB/RGBA/BGR/YUV /… |
Pre-Process | Crop/Scale/Draw/Enhance/Filters |
Output | Video File (single, multiple) with Digital Video Record (DVR) Image File (single, multiple) Show Window (single, multiple, composite) Stream: TCP/UDP/HTTP/RTP/RTSP |
All listed spectrum of steps require sufficient expertise in image/video processing. But could be hided by next Frameworks.
General information and Top Video Analytics Frameworks.
Framework | Year | Maintainer | Language | Video Streaming Framework | Arch | OS |
OpenCV | 2010 | Community | C/C++ Python Java | FFmpeg Gstreamer | arm; arm64; x64; x86 | Linux MacOS Windows iOS Android |
GstInference | 2019 | RidgeRun | C/C++ | Gstreamer | ||
NNStreamer | 2018 | Samsung | C/C++ | Gstreamer | arm arm64 x64 x86 *more | Tizen Ubuntu Android Yocto MacOS *more |
DeepStream | 2018 | Nvidia | C/C++ Python | Gstreamer (GPU-accelerated: Nvidia) | arm arm64 *more | Ubuntu *more |
Gst-Video-Analytics | 2019 | OpenCV | C/C++ | Gstreamer (GPU-CPU accelerated: VAAPI, OpenGL) | x64 x86 | Linux* *more |
What is under the hood of each Video Analytics Frameworks? Video Streaming and CV/ML Frameworks Support.
Framework | Video Streaming Framework | CV/ML Frameworks Support |
OpenCV | FFmpeg Gstreamer | OpenCV |
GstInference | Gstreamer | – Neural Compute SDK (NCSDK) – Tensorflow – Caffe – TensorRT – OpenCV * more |
NNStreamer* | Gstreamer | Tensorflow Tensorflow-Lite pytorch caffe2 |
DeepStream | Gstreamer (GPU-accelerated: Nvidia) | TensorRT; Caffe |
Gst-Video-Analytics | Gstreamer (GPU-CPU accelerated: VAAPI, OpenGL) | OpenVINOβOpenCV |
In general all Frameworks are built on top of open source media streaming libraries FFmpeg and Gstreamer. As a CV/ML Frameworks there are a variety of possible solutions: Tensorflow, Tensorflow-Lite, TensorRT, Pytorch, Caffe, OpenVINO, OpenCV.
Thoughts
Most Product Development Process from Client perspective could be reduced to follows (Video Analytics Case):
- Reduce Development Costs/Time
- Solutions:
- existing solutions reuse
- balance between Software Engineering and Data Science common skill set
- Solutions:
- Reduce Product Cost
- Solutions:
- process multiple video processing feeds; fast and accurate models usage
- efficient hardware usage
- shared memory resources
- up to 99% hardware capabilities usage
- reduced data storage and transmission
- efficient hardware usage
- process multiple video processing feeds; fast and accurate models usage
- Solutions:
- Reduce Product Scaling Costs/Time
- Solution
- generability: single solution multiple use cases
- Solution
All listed frameworks helps to build software faster, in more efficient way.
Personal Experience
- I started to prototype Video Analytics applications from OpenCV. When we exceeded the limits of it due to new project requirements (resolution/fps setup, video record, custom operations, performance improvement) we switched to Gstreamer (advice by another expert).
- I know C/C++ well, but started to dive deep into Gstreamer with Python (easier dependencies setup and development itself). With Python I was mostly focused on how framework works. So when prototyping we failed/succeeded faster.
- Exploring Gstreamer is challenging but rewarding process. Luck of resources, community is a huge problem. The main pain was to setup everything and make Python friends with Gstreamer.
- Diving into Gstreamer helped me to learn it’s architecture, code development approaches, basics of video processing. It was and still is an entertaining process π . I think that Gstreamer has one of the best architecture (interfaces, abstraction) which gives developers great flexibility and extensibility (sometimes code might be dirty, not intuitive, but nothing is perfect …) π
- Now, I’m glad when I see how other companies use Gstreamer for Video Analytics applications.
- FFmpeg I use often as a command line tool (commands are shorter, sometimes more clear).
- OpenCV works for me when there is a need to deliver prototype in a short time and there are no restrictions on Hardware performance. OpenCV supports Gstreamer as well but requires additional library build with enabled additional properties (with pip-package it is so much easier).
- I constantly look for new repositories, frameworks which simplifying development of Video Analytics Applications.
- Btw: In Examples by Google for Coral TPU Dev Board there are also both OpenCV and Gstreamer examples as well.
- Btw: In Examples by Google for Coral TPU Dev Board there are also both OpenCV and Gstreamer examples as well.
Conclusion
What are the most common features for all of them?
- Gstreamer is used for Video Streaming
- Tendency to provide Python bindings (so more developers can dive deep faster)
- Multiple Deep Learning/Computer Vision Frameworks Support (generability)
Which one to choose is definitely up to you and your project requirements?
- OS?
- Target Architecture?
- Programming Language
- Deadlines?
Due to my experience:
- start with the simplest
- explore, understand, exceed the limits
- make more conscious decision (based on experience you have now)
- circle steps 2-3
In any case, additional knowledge in Video Streaming (with Gstreamer or FFmpeg) could help you to improve you Project Design, Performance, Accuracy.
Hope you enjoy reading. In case I missed any framework – let me know in comments π