Certainly! I'll create documentation for the `Chroma` class, including information about initializing it with a YAML configuration file and explaining the differences between the various client types. Let's break this down step-by-step. # Chroma The `Chroma` class is a vector store implementation that extends the `BaseVectorStore` class. It provides functionality for storing, retrieving, and querying vector embeddings using various client types. You can visit the Chroma Vector DB site at [here](https://www.trychroma.com/). ### Initialization The `Chroma` class can be initialized with various parameters to configure its behavior and connection settings. Here's an overview of the initialization process: ```python from autorag.vectordb.chroma import Chroma chroma = Chroma( embedding_model: str, collection_name: str, embedding_batch: int = 100, client_type: str = "persistent", similarity_metric: str = "cosine", path: str = None, host: str = "localhost", port: int = 8000, ssl: bool = False, headers: Optional[Dict[str, str]] = None, api_key: Optional[str] = None, tenant: str = DEFAULT_TENANT, database: str = DEFAULT_DATABASE, ) ``` #### Key Parameters: - `embedding_model`: The name or identifier of the embedding model to use. - `collection_name`: The name of the collection to store embeddings. - `client_type`: The type of client to use (options: "ephemeral", "persistent", "http", "cloud"). - `similarity_metric`: The metric used for similarity calculations (default: "cosine", "ip", "l2"). - `path`: The path for persistent storage (required for persistent client). - `host`, `port`, `ssl`, `headers`: Configuration for HTTP client. - `api_key`: API key for cloud client. - `tenant`, `database`: Tenant and database identifiers. ### Initialization with YAML Configuration You can initialize the `Chroma` class using a YAML configuration file. Here's an example of how to structure the YAML file: ```yaml vectordb: - name: chroma_default db_type: chroma client_type: persistent embedding_model: mock collection_name: openai path: ${PROJECT_DIR}/resources/chroma ``` ### Client Types The `Chroma` class supports four different client types, each with its own use case and configuration: 1. **Ephemeral Client** - Use case: Temporary in-memory storage, useful for testing or short-lived operations. - Initialization: ```yaml vectordb: - name: chroma_ephemeral db_type: chroma client_type: ephemeral embedding_model: openai collection_name: openai ``` 2. **Persistent Client** - Use case: Local persistent storage, ideal for single-machine applications. - Initialization: ```yaml vectordb: - name: chroma_persistent db_type: chroma client_type: persistent embedding_model: openai collection_name: openai path: ${PROJECT_DIR}/resources/chroma ``` 3. **HTTP Client** - Use case: Connect to a remote Chroma server over HTTP/HTTPS. - Initialization: ```yaml vectordb: - name: chroma_http db_type: chroma client_type: http embedding_model: openai collection_name: openai host: http://localhost port: 8000 ssl: False ``` 4. **Cloud Client** - Use case: Connect to a managed Chroma cloud service. - Initialization: ```yaml vectordb: - name: chroma_cloud db_type: chroma client_type: cloud embedding_model: openai collection_name: openai api_key: YOUR_API_KEY ```