Vision API
Overview​
The Computer Vision API is intended to provide an end-to-end solution for computer vision tasks. It combines preprocessing, inference postprocessing and serving in one endpoint request. It uses Triton as inference source, combined with GPU accelerated image and video processing operations (bounding boxes, blur, Kalman filer, etc.). For serving streams it utilizes GPU accelerated Gstreamer pipelines.
Key Features:
- Image file as input, detection inference, processed image file as output
- Video file as input, detection inference, processed video file as output
- Stream as input, detection inference, processed RTSP stream as output
How to run​
As Triton is an inference source you need to first ensure you have set it up Triton.
For now Vision API accepts only standard detection output [x1, x2, y1, y2, score, class_1, class_2 ..] if for some reason your model have different output, API won't accept it.
The Computer-Vision API is available as a compute resource, and it requires different amount of resources depending on the processing operations you chose and frame resolution of the source. However, the app will always require GPU that supports image processing operations. The best choice will be A5000. For a start we recommend 1.5 CPU core and 1.5 GB RAM per video/stream you will be processing. This numbers can grow as you add operations or bring higher resolution sources.
To run an instance, use the following command:
cgc compute create --name vision-api -c 8 -m 12 -g 1 -gt A5000 vision-api
After initialization, your can access Swagger to view all endpoints at:
https://vision-api.<namespace>.cgc-waw-01.comtegra.cloud/docs
How to use​
Every endpoint will require you to pass an app token as authorization. This is the same app token you can view with:
cgc compute list -d
As written in Key Features, there are 3 ways of using Computer Vision API depending on the desired input and output. The first 2 in which you can request a file as output will be similar in usage.
First you upload file to
/image/upload
or /video/upload
and you will recive uuid that represents the source. Then you can include it in
image/to-image
or video/to-video
request.
Pay attention to request body:
{
"source_file_uuid": "98ada9eb-daf6-4ee4-b4be-9fab7abdf619",
"tritonServer": {
"triton_url": "triton",
"infer_model_name": "yolov8n",
"version": "1"
},
"inference": {
"confidence_threshold": 0.25,
"iou_threshold": 0.45
},
"postprocess": {
"bbox": {
"apply": true,
"rectangle_color": "default",
"text_color": "default",
"rectangle_thickness": 2,
"text_size": 0.5,
"text_thickness": 2,
"include_text": false
},
"blur": {
"apply": false,
"blur_intensity": [99, 99]
},
"pose": {
"apply": true,
"keypoints": {
"apply": true,
"keypoint_color": [0, 165, 255],
"line_color": [255, 165, 0],
"radius": 5,
"line_thickness": 2
},
"pose_classifier": {
"apply": true,
"slope_threshold": 40,
"fall_detector": {
"apply": false,
"alert_threshold": 2
}
}
}
},
"output_format": "jpg"
}
Here you can pass received uuid and change anything you want about the inference and postprocessing. After you fill the payload, send the request and wait. Once the processing finished, you will receive a file containing processed image or video.
The second 2 which outputs a RTSP stream will require similar request, but they will immediately return a stream url. You can capture it using this url. By default, the RTSP stream will be available only in your namespace. If you want to view it on external machines navigate to Detailed information section.
Once you don't need to stream anymore you can delete it passing stream_id to /stream/delete-stream
.
There is also an info endpoint info/get-models
that retrieves models available on Triton with information about them.
If you prefer to use api not via Swagger, you can make requests like:
curl -X 'POST' \
'https://<appname>.<namespace>.cgc-waw-01.comtegra.cloud/video/to-stream' \
-H 'accept: application/json' \
-H 'app-token: 8d23ae613a4e46119f4d52cb25e8b551' \
-H 'Content-Type: application/json' \
-d '{
"source_file_uuid": "ac839d89-14c7-4116-b5d8-30c34c714971",
"tritonServer": {
"triton_url": "triton",
...
}
putting app_token in header, and params in body.
Detailed information​
Optional app params​
additional optional parameters you can pass to create command in cli:
MAX_STREAMS
- maximum number of streams that can run at a time, by default it iscpu cores / 2
.BUFFER_SIZE
- size of network buffers used by app. Best to make it not larger than your MTU. By default, 9000.GST_SERVERS
- number of Gstreamer servers. By default, it isMAX_STREAMS / 5
. If your source is higher resolution or lower, you can experiment with this param to get optimal resource consumption.
Payload params​
Payload has a couple of sections, beginning with Triton section, where we need to specify the host of triton. In the case of CGC, Triton deployed in the same namespace will be available under the container name for example: triton
. Additionally, we select the model we want to use, its version, and the triton_request_concurrency
, which is available exclusively for video endpoint and indicates how many clients will query Triton simultaneously. This is an example of a parameter that affects both the execution time of the query and resource consumption, which must be considered.
The inference section pertains to topics typical for model inference:
- confidence threshold
- intersection over union threshold
The postprocess section concerns everything that happens after obtaining inference results. We can choose to overlay boxes in places where the model has successfully detected something, blur those areas, or use a Kalman filter to improve detection continuity.
Pose estimation can be configured by setting pose
to true when using pose models.
The pose estimation section includes two key options:
keypoints
: This option, enabled by default, draws keypoints representing human joints.pose_classifier
: This option allows the classification of poses, such as "Standing" or "Laying." Additionally, a fall detection feature can be enabled, which will trigger an alert on the screen if a fall is detected.
When using pose estimation models, it is required to set the pose
parameter to true for proper functionality.
At the moment pose estimation feature only supports YOLO-pose models with 17 COCO keypoints.
Each option has its sub-options that modify their behavior. Again, it is important to remember that the more postprocessing we add, the longer the query will take and the more resources it will consume. All options except boxes are not applied by default. In case of colors, the default option - "default" means that every class detected will have different random color assigned to it. If you want a certain color you can pass in form of RGB array - ex.: [255, 0, 0], then EVERY class will be this color. This can be useful when you have only 1 or couple classes.
Finally, there is the output format. Currently, only one format is implemented, so the default setting should not be changed.
RTSP streams​
RTSP uses TCP for transportation, but CGC does not provide option to open TCP port from cli. If you want to view it on external machines, you will need to add output TCP ports manually.
Every Gstreamer server uses separate TCP port. First one serves on 9001 and every another on the next one, so 9002, 9003, ... . You will need to add TCP port for every one of them.