2. Retrieval¶

🔎 Definition¶

The retrieval process involves using queries to fetch relevant content, identifiers (IDs), and scores from a corpus. This is a fundamental operation in RAG, where the aim is to find the most relevant information based on the user’s query.

🔢 Parameters¶

Overview¶

This document serves as a guide for configuring parameters, strategies, and the YAML file for various nodes within a system.

Node Parameters¶

Top_k

Description: The top_k parameter is used at the node level to define the top ‘k’ results to be retrieved from corpus.

Strategy Parameters¶

Metrics:
- Types: retrieval_f1, retrieval_recall, retrieval_precision
Purpose

These metrics are used to evaluate the effectiveness of the retrieval process, measuring the accuracy, recall, and precision of the retrieved content.
Speed Threshold:
- Description: speed_threshold is applied across all nodes, ensuring that any method exceeding the average processing time for a query is not used.

Example config.yaml file¶

- node_line_name: retrieve_node_line
  nodes:
    - node_type: retrieval
      strategy:
        metrics: [retrieval_f1, retrieval_recall, retrieval_precision]
        speed_threshold: 10
      top_k: 10
      modules:
        - module_type: bm25
        - module_type: vectordb
          embedding_model: openai
        - module_type: hybrid_rrf
          weight_range: (4, 80)
        - module_type: hybrid_cc
          normalize_method: [ mm, tmm, z, dbsf ]
          weight_range: (0.0, 1.0)
          test_weight_size: 51

2. Retrieval¶

🔎 Definition¶

🔢 Parameters¶

Overview¶

Node Parameters¶

Strategy Parameters¶

Example config.yaml file¶

Supported Modules¶