Certainly! I’ll create documentation for the Chroma
class, including information about initializing it with a YAML configuration file and explaining the differences between the various client types. Let’s break this down step-by-step.
Chroma¶
The Chroma
class is a vector store implementation that extends the BaseVectorStore
class. It provides functionality for storing, retrieving, and querying vector embeddings using various client types.
You can visit the Chroma Vector DB site at here.
Initialization¶
The Chroma
class can be initialized with various parameters to configure its behavior and connection settings. Here’s an overview of the initialization process:
from autorag.vectordb.chroma import Chroma
chroma = Chroma(
embedding_model: str,
collection_name: str,
embedding_batch: int = 100,
client_type: str = "persistent",
similarity_metric: str = "cosine",
path: str = None,
host: str = "localhost",
port: int = 8000,
ssl: bool = False,
headers: Optional[Dict[str, str]] = None,
api_key: Optional[str] = None,
tenant: str = DEFAULT_TENANT,
database: str = DEFAULT_DATABASE,
)
Key Parameters:¶
embedding_model
: The name or identifier of the embedding model to use.collection_name
: The name of the collection to store embeddings.client_type
: The type of client to use (options: “ephemeral”, “persistent”, “http”, “cloud”).similarity_metric
: The metric used for similarity calculations (default: “cosine”, “ip”, “l2”).path
: The path for persistent storage (required for persistent client).host
,port
,ssl
,headers
: Configuration for HTTP client.api_key
: API key for cloud client.tenant
,database
: Tenant and database identifiers.
Initialization with YAML Configuration¶
You can initialize the Chroma
class using a YAML configuration file. Here’s an example of how to structure the YAML file:
vectordb:
- name: chroma_default
db_type: chroma
client_type: persistent
embedding_model: mock
collection_name: openai
path: ${PROJECT_DIR}/resources/chroma
Client Types¶
The Chroma
class supports four different client types, each with its own use case and configuration:
Ephemeral Client
Use case: Temporary in-memory storage, useful for testing or short-lived operations.
Initialization:
vectordb: - name: chroma_ephemeral db_type: chroma client_type: ephemeral embedding_model: openai collection_name: openai
Persistent Client
Use case: Local persistent storage, ideal for single-machine applications.
Initialization:
vectordb: - name: chroma_persistent db_type: chroma client_type: persistent embedding_model: openai collection_name: openai path: ${PROJECT_DIR}/resources/chroma
HTTP Client
Use case: Connect to a remote Chroma server over HTTP/HTTPS.
Initialization:
vectordb: - name: chroma_http db_type: chroma client_type: http embedding_model: openai collection_name: openai host: http://localhost port: 8000 ssl: False
Cloud Client
Use case: Connect to a managed Chroma cloud service.
Initialization:
vectordb: - name: chroma_cloud db_type: chroma client_type: cloud embedding_model: openai collection_name: openai api_key: YOUR_API_KEY