Skip to main content

Weaviate

Weaviate is an open source, AI-native, vector database that helps developers create intuitive and reliable, AI-powered applications.

Weaviate comes with a custom vectorizer, powered by the intfloat/multilingual-e5-large model.
More about the model can be found at the official huggingface repo.

Vectorizer

Vectorizer has to be run separately as compute resource.

How to run it​

If you want to run a basic database, you just need to create a new database through cgc.

cgc db create --name weaviate01 -c 4 -m 8 -v weaviate_volume weaviate

If you'd like to use the external vectorizer, you need to run it first

cgc compute create --name vectorizer -c 4 -m 24 -g 1 -gt A5000 t2v-transformers

For more information about the vectorizer please refer to its documentation page

and then pass additional parameters to the startup command

cgc db create --name weaviate01 -c 4 -m 8 -v weaviate_volume weaviate -e weaviate_enable_modules=text2vec-transformers -e weaviate_transformers_inference_api=http://<VECTORIZER_RESOURCE_NAME>:8080

Default configuration​

The default configuration for Weaviate is set to use the Weaviate database engine. The database will be created with the following parameters:

  • QUERY_DEFAULTS_LIMIT=20: The default limit for queries.
  • AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=false: Disables anonymous access to the database.
  • AUTHENTICATION_APIKEY_ENABLED=true: Enables API key authentication.
  • AUTHENTICATION_APIKEY_ALLOWED_KEYS: The list of allowed API keys for authentication. This is set to CGC specific app_token that you receive after creating the database.
  • AUTHENTICATION_APIKEY_USERS=admin@localhost: The list of users allowed to access the database.
  • PERSISTENCE_DATA_PATH: The path to the data directory where the database will store its data. This is set to /mnt/vectordb.
  • CLUSTER_HOSTNAME=node1: The hostname of the database node.

How to connect​

We are working on incorporating the weaviate client into CGC SDK, but as for now, please use the official client installed with pip.
In your notebook environment you can connect to the database like this.

First install the weaviate client

!pip install weaviate-client

Then obtain your weaviate token from

cgc db list -d

Next, import the client and make a connection

import weaviate

WEAVIATE_URL = "http://weaviate:8080"

auth_client_secret = weaviate.AuthApiKey(api_key="<WEAVIATE_TOKEN>")
client = weaviate.Client(
url=WEAVIATE_URL,
auth_client_secret=auth_client_secret,
)