Skip to main content

CV model on Triton

1. Train and Save Your PyTorch Model

First, you need to have a trained PyTorch model. Once the model is trained, save it using PyTorch's torch.save() function.

import torch

# Assuming `model` is your trained PyTorch model
model_path = "path/to/your/model.pth"
torch.save(model, model_path)

2. Convert the Model to ONNX Format

Triton Inference Server supports multiple model formats including ONNX, TensorFlow, and TensorRT. ONNX (Open Neural Network Exchange) is a popular choice for its cross-platform and cross-framework compatibility.

To convert a PyTorch model to ONNX:

import torch.onnx

# Load the trained model
model = torch.load('path/to/your/model.pt')
model.eval()

# Create dummy input in the shape the model expects
dummy_input = torch.randn(1, 3, 224, 224) # example input shape for an image model

# Export the model
onnx_model_path = "path/to/exported/model.onnx"
torch.onnx.export(model, dummy_input, onnx_model_path)

3. Create a Configuration File for Triton

Triton requires a config.pbtxt file to understand how to serve your model. This file includes information about the model name, platform (ONNX, TensorFlow, etc.), input and output tensors, and other settings.

Here's an example config.pbtxt for an ONNX model:

name: "model_name"
platform: "onnxruntime_onnx"
input [
{
name: "images0" # Name of the input layer in your model
data_type: TYPE_FP32
dims: [3, 224, 224] # Replace with the input dimensions of your model
}
]
output [
{
name: "output0" # Name of the output layer in your model
data_type: TYPE_FP32
dims: [6, 17661] # Replace with the output dimensions of your model
}
]

You can use netron.app to see your model specifics (input dims, output dims, etc.).

4. Deploy the Model on Triton

Place your ONNX model file and the config.pbtxt file in a directory inside Triton’s models directory. The directory structure should look like this:

└──/models-repository
└──/model_name
├──config.pbtxt
└──/1
└──model_name.onnx

5. Start the Triton Server

To start your Triton Inference server please refer to How to run