Weaviate
Weaviate is an open source, AI-native, vector database that helps developers create intuitive and reliable, AI-powered applications.
Weaviate comes with a custom vectorizer, powered by the intfloat/multilingual-e5-large
model.
More about the model can be found at the official huggingface repo.
Vectorizer has to be run separately as compute resource.
How to run it​
If you want to run a basic database, you just need to create a new database through cgc.
cgc db create --name weaviate01 -c 4 -m 8 -v weaviate_volume weaviate
If you'd like to use the external vectorizer, you need to run it first
cgc compute create --name vectorizer -c 4 -m 24 -g 1 -gt A5000 t2v-transformers
For more information about the vectorizer please refer to its documentation page
and then pass additional parameters to the startup command
cgc db create --name weaviate01 -c 4 -m 8 -v weaviate_volume weaviate -e weaviate_enable_modules=text2vec-transformers -e weaviate_transformers_inference_api=http://<VECTORIZER_RESOURCE_NAME>:8080
Default configuration​
The default configuration for Weaviate is set to use the Weaviate database engine. The database will be created with the following parameters:
QUERY_DEFAULTS_LIMIT=20
: The default limit for queries.AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=false
: Disables anonymous access to the database.AUTHENTICATION_APIKEY_ENABLED=true
: Enables API key authentication.AUTHENTICATION_APIKEY_ALLOWED_KEYS
: The list of allowed API keys for authentication. This is set to CGC specificapp_token
that you receive after creating the database.AUTHENTICATION_APIKEY_USERS=admin@localhost
: The list of users allowed to access the database.PERSISTENCE_DATA_PATH
: The path to the data directory where the database will store its data. This is set to/mnt/vectordb
.CLUSTER_HOSTNAME=node1
: The hostname of the database node.
How to connect​
We are working on incorporating the weaviate client into CGC SDK, but as for now, please use the official client installed with pip
.
In your notebook environment you can connect to the database like this.
First install the weaviate client
!pip install weaviate-client
Then obtain your weaviate token from
cgc db list -d
Next, import the client and make a connection
import weaviate
WEAVIATE_URL = "http://weaviate:8080"
auth_client_secret = weaviate.AuthApiKey(api_key="<WEAVIATE_TOKEN>")
client = weaviate.Client(
url=WEAVIATE_URL,
auth_client_secret=auth_client_secret,
)