Skip to main content

Vision API

Overview​

The Computer Vision API is intended to provide an end-to-end solution for computer vision tasks. It combines preprocessing, inference postprocessing and serving in one endpoint request. It uses Triton as inference source, combined with GPU accelerated image and video processing operations (bounding boxes, blur, Kalman filer, etc.). For serving streams it utilizes GPU accelerated Gstreamer pipelines.

Key Features:

  • Image file as input, processed image file as output
  • Video file as input, processed video file as output
  • Stream as input, processed RTSP stream as output

How to run​

As Triton is a inference source you need to first ensure you have set it up Triton.

The Computer-Vision API is available as a compute resource and it requires different amount of resources depending on the processing operations you chose and frame resolution of the source. However the app will always require GPU that supports image processing operations. The best choice will be A5000. For a start we recommend 1.5 CPU core and 1.5 GB RAM per video/stream you will be processing. This numbers can grow as you add operations or bring higher resoultion sources.

To run an instance, use the following command:

cgc compute create --name vision-api -c 8 -m 12 -g 1 -gt A5000 vision-api

After initialization, your can access Swagger to view all endpoints at:

https://vision-api.<namespace>.cgc-waw-01.comtegra.cloud/docs

How to use​

Every endpoint will require you to pass an app token as authorization. This is the same app token you can view with:

cgc compute list -d

As written in Key Features, there are 4 ways of using Computer Vision API depending on the desired input and output. The first 2 in which you can request a file as output will be similar in usage.

First you upload file to
/image/upload or /video/upload
and you will recive uuid that represents the source. Then you can include it in
image/to-image or video/to-video request.
Pay attention to request body:

{
"source_file_uuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"tritonServer": {
"triton_url": "triton",
"infer_model_name": "yolov8n_dynamic_ims480_fp16",
"version": "1"
},
"inference": {
"confidence_threshold": 0.25,
"iou_threshold": 0.45
},
"postprocess": {
"bbox": {
"apply": true,
"rectangle_color": [0, 255, 0],
"text_color": [0, 255, 0],
"rectangle_thickness": 2,
"text_size": 0.5,
"text_thickness": 2,
"include_text": true
},
"blur": {
"apply": false,
"blur_intensity": [99, 99]
}
},
"output_format": "jpg"
}

Here you can pass recived uuid and change anything you want about the inference and postprocessing. After you fill the payload, send the request and wait. Once the processing finished, you will recive a file contaning processed image or video.

The second 2 which outputs a RTSP stream will require similar request, but they will immediately return a stream url. You can capture it using this url. By default the RTSP stream will be available only in your namespace. If you want to view it on external machines navigate to Detailed information section.

Once you don't need stream anymore you can delete it passing stream_id to /stream/delete-stream.

There is also an info endpoint info/get-models that retreives models available on Triton with informations about them.

If you prefer to use api not via Swagger, you can make requests like:

curl -X 'POST' \
'https://<appname>.<namespace>.cgc-waw-01.comtegra.cloud/video/to-stream' \
-H 'accept: application/json' \
-H 'app-token: 8d23ae613a4e46119f4d52cb25e8b551' \
-H 'Content-Type: application/json' \
-d '{
"source_file_uuid": "ac839d89-14c7-4116-b5d8-30c34c714971",
"tritonServer": {
"triton_url": "triton",
...
}

putting app_token in header, and params in body.

Detailed information​

Optional app params​

additional optional parametrs you can pass to create command in cli:

  • MAX_STREAMS - maximum number of streams that can run at a time, by default it is cpu cores / 2.
  • BUFFER_SIZE - size of network buffers used by app. Best to make it not larger than your MTU. By default 9000.
  • GST_SERVERS - number of Gstreamer servers. By default it is MAX_STREAMS / 5. If your source is higher resoultion or lower, you can experiment with this param to get optimal resource comsumtion.

Payload parmas​

Payload has couple of sections, beggining with Triton section, where we need to specify the host of triton. In the case of CGC, Triton deployed in the same namespace will be available under the container name for example: triton. Additionally, we select the model we want to use, its version, and the triton_request_concurrency, which is available exclusively for video endpoint and indicates how many clients will query Triton simultaneously. This is an example of a parameter that affects both the execution time of the query and resource consumption, which must be considered.

The inference section pertains to topics typical for model inference:

  • confidence threshold
  • intersection over union threshold

The postprocess section concerns everything that happens after obtaining inference results. We can choose to overlay boxes in places where the model has successfully detected something, blur those areas, or use a Kalman filter to improve detection continuity. Each option has its sub-options that modify their behavior. Again, it is important to remember that the more postprocessing we add, the longer the query will take and the more resources it will consume. All options except boxes are not applied by default.

Finally, there is the output format. Currently, only one format is implemented, so the default setting should not be changed.

RTSP streams​

RTSP uses TCP for transportation, but CGC does not provide option to open TCP port from cli. If you want to view it on external machines, you will need to add output TCP ports manually.

caution

Every Gstreamer server uses separate TCP port. First one serves on 9001 and every another on the next one, so 9002, 9003, ... . You will need to add TCP port for every one of them.