2. Retrieval¶
🔎 Definition¶
The retrieval process involves using queries to fetch relevant content, identifiers (IDs), and scores from a corpus. This is a fundamental operation in RAG, where the aim is to find the most relevant information based on the user’s query.
🔢 Parameters¶
Overview¶
This document serves as a guide for configuring parameters, strategies, and the YAML file for various nodes within a system.
Node Parameters¶
Top_k
Description: The
top_k
parameter is used at the node level to define the top ‘k’ results to be retrieved from corpus.
Strategy Parameters¶
Metrics:
Types:
retrieval_f1
,retrieval_recall
,retrieval_precision
Purpose
These metrics are used to evaluate the effectiveness of the retrieval process, measuring the accuracy, recall, and precision of the retrieved content.
Speed Threshold:
Description:
speed_threshold
is applied across all nodes, ensuring that any method exceeding the average processing time for a query is not used.
Example config.yaml file¶
- node_line_name: retrieve_node_line
nodes:
- node_type: retrieval
strategy:
metrics: [retrieval_f1, retrieval_recall, retrieval_precision]
speed_threshold: 10
top_k: 10
modules:
- module_type: bm25
- module_type: vectordb
embedding_model: openai
- module_type: hybrid_rrf
weight_range: (4, 80)
- module_type: hybrid_cc
normalize_method: [ mm, tmm, z, dbsf ]
weight_range: (0.0, 1.0)
test_weight_size: 51