Certainly! I’ll create documentation for the Chroma class, including information about initializing it with a YAML configuration file and explaining the differences between the various client types. Let’s break this down step-by-step.

Chroma

The Chroma class is a vector store implementation that extends the BaseVectorStore class. It provides functionality for storing, retrieving, and querying vector embeddings using various client types. You can visit the Chroma Vector DB site at here.

Initialization

The Chroma class can be initialized with various parameters to configure its behavior and connection settings. Here’s an overview of the initialization process:

from autorag.vectordb.chroma import Chroma

chroma = Chroma(
    embedding_model: str,
    collection_name: str,
    embedding_batch: int = 100,
    client_type: str = "persistent",
    similarity_metric: str = "cosine",
    path: str = None,
    host: str = "localhost",
    port: int = 8000,
    ssl: bool = False,
    headers: Optional[Dict[str, str]] = None,
    api_key: Optional[str] = None,
    tenant: str = DEFAULT_TENANT,
    database: str = DEFAULT_DATABASE,
)

Key Parameters:

  • embedding_model: The name or identifier of the embedding model to use.

  • collection_name: The name of the collection to store embeddings.

  • client_type: The type of client to use (options: “ephemeral”, “persistent”, “http”, “cloud”).

  • similarity_metric: The metric used for similarity calculations (default: “cosine”, “ip”, “l2”).

  • path: The path for persistent storage (required for persistent client).

  • host, port, ssl, headers: Configuration for HTTP client.

  • api_key: API key for cloud client.

  • tenant, database: Tenant and database identifiers.

Initialization with YAML Configuration

You can initialize the Chroma class using a YAML configuration file. Here’s an example of how to structure the YAML file:

vectordb:
  - name: chroma_default
    db_type: chroma
    client_type: persistent
    embedding_model: mock
    collection_name: openai
    path: ${PROJECT_DIR}/resources/chroma

Client Types

The Chroma class supports four different client types, each with its own use case and configuration:

  1. Ephemeral Client

    • Use case: Temporary in-memory storage, useful for testing or short-lived operations.

    • Initialization:

      vectordb:
        - name: chroma_ephemeral
          db_type: chroma
          client_type: ephemeral
          embedding_model: openai
          collection_name: openai
      
  2. Persistent Client

    • Use case: Local persistent storage, ideal for single-machine applications.

    • Initialization:

      vectordb:
        - name: chroma_persistent
          db_type: chroma
          client_type: persistent
          embedding_model: openai
          collection_name: openai
          path: ${PROJECT_DIR}/resources/chroma
      
  3. HTTP Client

    • Use case: Connect to a remote Chroma server over HTTP/HTTPS.

    • Initialization:

      vectordb:
        - name: chroma_http
          db_type: chroma
          client_type: http
          embedding_model: openai
          collection_name: openai
          host: http://localhost
          port: 8000
          ssl: False
      
  4. Cloud Client

    • Use case: Connect to a managed Chroma cloud service.

    • Initialization:

      vectordb:
        - name: chroma_cloud
          db_type: chroma
          client_type: cloud
          embedding_model: openai
          collection_name: openai
          api_key: YOUR_API_KEY