Skip to main content

LLM inference API

We provide an LLM inference API as an additional service. The API is partially compatible with the OpenAI API. The base URL is https://llm.comtegra.cloud/.

The supported endpoints are:

Requests are authenticated using a bearer token, so an example request looks like this:

POST /v1/chat/completions HTTP/2
Host: llm.comtegra.cloud
Content-Type: application/json
Authorization: Bearer YOUR-API-SECRET
Content-Length: ...

{"model": "llama31-70b", "messages": ...}

To start using the API you only need to generate a special API key via the CGC client.

API keys​

cgc api-keys create --level LLM

The output will contain an API secret. Save it somewhere convenient and safe. You'll need it to present it for every request you make to the API. There's no way to retrieve it a second time. If you lose it, delete it and create another one. Keep it secret as using it generates costs for your organization.

You may add a comment to a new API key so that it's easier to identify later.

cgc api-keys create --level LLM --comment "Mike's key"

See cgc api-keys --help for all commands and options.

Billing​

Requests are billed on a per-token basis. The price per token depends on the model you use and the GPU it's running on. The larger the model, the higher the price. The faster the GPU, the higher the price. Completion (output) tokens are added to your organization's invoice as well as prompt (input) tokens. Price list is available on our pricing page.

You can view token usage and cost using cgc billing status.

For example, you use the Meta-Llama 3.1-70B-Instruct-Q5_K_M model running on a NVIDIA A100 GPU. Let's assume prompt token price is 19,78 zł / 1M tok. and completion token price is 167,66 zł / 1M tok. Your prompt is: Write a haiku about ChatGPT, which is 18 tokens long. You get the following response: Silicon whispers. ChatGPT's gentle responses. Knowledge at my door, which is 16 tokens long. Your organization will be billed 19,78 zł * 18 / 1000000 + 167,66 zł * 16 / 1000000 = 0,003039 zł for this request.