LLM inference API
We provide an LLM inference API as an additional service.
The API is partially compatible with the OpenAI API.
The base URL is https://llm.comtegra.cloud/v1
.
The supported endpoints are:
Requests are authenticated using a bearer token. To start using the API you only need to generate it via the CGC client.
Usage examples​
Chat​
- Python
- curl
from openai import OpenAI
client = OpenAI(base_url="https://llm.comtegra.cloud/v1",
api_key="YOUR-API-SECRET")
res = client.chat.completions.create(model="llama3-8b", max_completion_tokens=100,
messages=[{"role": "user", "content": "Hi"}])
print(res)
curl \
-H 'Authorization: Bearer YOUR-API-SECRET' \
-d '{"model": "llama3-8b", "messages": [{"role": "user", "content": "Hi"}]}' \
'https://llm.comtegra.cloud/v1/chat/completions'
These instructions are for the curl program. In Microsoft's Power Shell curl is an alias to Invoke-WebRequest, which is not compatible with the proper curl.
Embeddings​
- Python
- curl
from openai import OpenAI
client = OpenAI(base_url="https://llm.comtegra.cloud/v1",
api_key="YOUR-API-SECRET")
res = client.embeddings.create(model="gte-qwen2-7b", input="Mary had a little lamb")
print(res)
curl \
-H 'Authorization: Bearer YOUR-API-SECRET' \
-d '{"model": "gte-qwen2-7b","input": "Mary had a little lamb", "pooling": "mean"}' \
'https://llm.comtegra.cloud/v1/chat/completions'
These instructions are for the curl program. In Microsoft's Power Shell curl is an alias to Invoke-WebRequest, which is not compatible with the proper curl.
Transcriptions​
- Python
- curl
from openai import OpenAI
client = OpenAI(base_url="https://llm.comtegra.cloud/v1",
api_key="YOUR-API-SECRET")
with open("recording.mp3", "rb") as f:
res = client.audio.transcriptions.create(model="whisper-1", file=f)
print(res)
curl \
-H 'Authorization: Bearer YOUR-API-SECRET' \
-F file=@'recording.mp3' \
-F model='whisper-1' \
'https://llm.comtegra.cloud/v1/audio/transcriptions'
These instructions are for the curl program. In Microsoft's Power Shell curl is an alias to Invoke-WebRequest, which is not compatible with the proper curl.
API keys​
cgc api-keys create --level LLM
The output will contain an API secret. Save it somewhere convenient and safe. You'll need it to present it for every request you make to the API. There's no way to retrieve it a second time. If you lose it, delete it and create another one. Keep it secret as using it generates costs for your organization.
You may add a comment to a new API key so that it's easier to identify later.
cgc api-keys create --level LLM --comment "Mike's key"
See cgc api-keys --help
for all commands and options.
Billing​
Requests are billed on a per-token basis. The price per token depends on the model you use and the GPU it's running on. The larger the model, the higher the price. The faster the GPU, the higher the price. Completion (output) tokens are added to your organization's invoice as well as prompt (input) tokens. Price list is available on our pricing page.
You can view token usage and cost using cgc billing status
.
For example, you use the Meta-Llama 3.1-70B-Instruct-Q5_K_M model running on a NVIDIA A100 GPU. Let's assume prompt token price is 19,78 zł / 1M tok. and completion token price is 167,66 zł / 1M tok. Your prompt is: Write a haiku about ChatGPT, which is 18 tokens long. You get the following response: Silicon whispers. ChatGPT's gentle responses. Knowledge at my door, which is 16 tokens long. Your organization will be billed 19,78 zł * 18 / 1000000 + 167,66 zł * 16 / 1000000 = 0,003039 zł for this request.
Self-managed instance​
If you’d like to use multiple models in your own namespace with access control and per-user token logs you might want to consider running a self-managed instance of LLM API.
Administration tasks such as user management and usage monitoring require SSH access to your LLM API container. You may configure it by following the instructions on the [SSH Access] page.
[SSH Access]: /Getting Started/ssh-access
Creating a self-managed instance​
Available soon.
User management​
To allow a new user to access the API you may create a new API key for them. Log into your LLM API container. Running the following command will create a new API key. It’s strongly recommended to add a comment to each key, indicating the user or purpose of the key. Without comments keys are very hard to distinguish. You may also add an expiration date so that the new key will not be valid after this date.
llmproxyctl user create --comment 'John Smith' --expires '2030-01-01 13:37'
# with shorthand argument forms
llmproxyctl user create -t 'John Smith' -e '2030-01-01 13:37'
The program will print the details of the newly created user and the associated randomly generated API key.
User created
Expires: 2030-01-01 12:37:00+00:00
Comment: John Smith
Hash: dd2a60b77cb2
Plain API key: 3FpwuzyXdU-u3bm8hg9ipA3ZB7JFWSqYHPRw1EMeQsB-XuV5cJ_...
Please note that this is the only time you’ll see the plain API key. It’s not possible to recover it if it’s lost because it’s stored in the database in hashed form.
You may list all keys with the following command.
llmproxyctl user list
If you wish to edit the comment or expiration date of a user you may use the
llmproxyctl user update
command.
First, look at the Hash (first) column in the user list.
To select a user to edit, use the first few characters of the corresponding hash.
Please see the following example.
$ llmproxyctl user list
Hash Expires Status Comment
------------ -------------------- ------- -------
dd2a60b77cb2 2030-01-01 12:37:00 active John Smith
a26a23f52cd8 - active Sam Altman
da7534bd88a0 2022-10-26 22:00:00 expired Jack Dorosey
$ llmproxyctl user update -t "Elon Musk" -e '2030-01-01 00:00' da
User updated
$ llmproxyctl user list
Hash Expires Status Comment
------------ -------------------- ------- -------
dd2a60b77cb2 2030-01-01 12:37:00 active John Smith
a26a23f52cd8 - active Sam Altman
da7534bd88a0 2030-01-01 00:00:00 active Elon Musk
If you wish to disable a user, you may set their expiration date to a special
value now
.
llmproxyctl user update -e now da