Vectordb

The VectorDB module is a retrieval module that uses VectorDB as a backend. You can use Dense Retrieval with this class easily.

It first embeds the passage content using an embedding model, then stores the embedded vector in VectorDB. When retrieving, it embeds the query and searches for the most similar vectors in VectorDB. Lastly, it returns the passages that have the most similar vectors.

Backend Support

As of now, the VectorDB module exclusively supports ChromaDB as its backend database. We choose ChromaDB because it is a local VectorDB that needs no internet connection, server fee, or API key. Plus, it is open-source software.

Module Parameters

  • Parameter: embedding_model

  • Usage: Defines the model used for embedding in the VectorDB module, impacting how data is represented and retrieved.

Tip

Information about the Embedding model can be found Supporting Embedding models. Plus, you can learn about how to add custom embedding model at here.

  • Parameter: embedding_batch

  • Usage: It is the batch size of the embedding model. It automatically set to the ingestion process using an embedding model. If you get error on the embedding model, try to lower this parameter.

Example config.yaml

modules:
  - module_type: vectordb
    embedding_model: openai
    embedding_batch: 64