autorag.nodes.queryexpansion package

Submodules

autorag.nodes.queryexpansion.base module

autorag.nodes.queryexpansion.base.make_generator_callable_param(generator_dict: Dict | None)[source]
autorag.nodes.queryexpansion.base.query_expansion_node(func)[source]

autorag.nodes.queryexpansion.hyde module

autorag.nodes.queryexpansion.hyde.hyde(queries: List[str], generator_func: Callable, generator_params: Dict, prompt: str = 'Please write a passage to answer the question') List[List[str]][source]

HyDE, which inspired by “Precise Zero-shot Dense Retrieval without Relevance Labels” (https://arxiv.org/pdf/2212.10496.pdf) LLM model creates a hypothetical passage. And then, retrieve passages using hypothetical passage as a query. :param queries: List[str], queries to retrieve. :param generator_func: Callable, generator functions. :param generator_params: Dict, generator parameters. :param prompt: prompt to use when generating hypothetical passage :return: List[List[str]], List of hyde results.

autorag.nodes.queryexpansion.multi_query_expansion module

autorag.nodes.queryexpansion.multi_query_expansion.get_multi_query_expansion(query: str, answer: str) List[str][source]
autorag.nodes.queryexpansion.multi_query_expansion.multi_query_expansion(queries: List[str], generator_func: Callable, generator_params: Dict, prompt: str = 'You are an AI language model assistant.\n    Your task is to generate 3 different versions of the given user\n    question to retrieve relevant documents from a vector  database.\n    By generating multiple perspectives on the user question,\n    your goal is to help the user overcome some of the limitations\n    of distance-based similarity search. Provide these alternative\n    questions separated by newlines. Original question: {question}') List[List[str]][source]

Expand a list of queries using a multi-query expansion approach. LLM model generate 3 different versions queries for each input query.

Parameters:
  • queries – List[str], queries to decompose.

  • generator_func – Callable, generator functions.

  • generator_params – Dict, generator parameters.

  • prompt – str, prompt to use for multi-query expansion. default prompt comes from langchain MultiQueryRetriever default query prompt.

Returns:

List[List[str]], list of expansion query.

autorag.nodes.queryexpansion.pass_query_expansion module

autorag.nodes.queryexpansion.pass_query_expansion.pass_query_expansion(queries: List[str])[source]

Do not perform query expansion. Return with the same queries. The dimension will be 2-d list, and the column name will be ‘queries’.

autorag.nodes.queryexpansion.query_decompose module

autorag.nodes.queryexpansion.query_decompose.get_query_decompose(query: str, answer: str) List[str][source]

decompose query to little piece of questions. :param query: str, query to decompose. :param answer: str, answer from query_decompose function. :return: List[str], list of a decomposed query. Return input query if query is not decomposable.

autorag.nodes.queryexpansion.query_decompose.query_decompose(queries: List[str], generator_func: Callable, generator_params: Dict, prompt: str = 'Decompose a question in self-contained sub-questions. Use "The question needs no decomposition" when no decomposition is needed.\n\n    Example 1:\n\n    Question: Is Hamlet more common on IMDB than Comedy of Errors?\n    Decompositions:\n    1: How many listings of Hamlet are there on IMDB?\n    2: How many listing of Comedy of Errors is there on IMDB?\n\n    Example 2:\n\n    Question: Are birds important to badminton?\n\n    Decompositions:\n    The question needs no decomposition\n\n    Example 3:\n\n    Question: Is it legal for a licensed child driving Mercedes-Benz to be employed in US?\n\n    Decompositions:\n    1: What is the minimum driving age in the US?\n    2: What is the minimum age for someone to be employed in the US?\n\n    Example 4:\n\n    Question: Are all cucumbers the same texture?\n\n    Decompositions:\n    The question needs no decomposition\n\n    Example 5:\n\n    Question: Hydrogen\'s atomic number squared exceeds number of Spice Girls?\n\n    Decompositions:\n    1: What is the atomic number of hydrogen?\n    2: How many Spice Girls are there?\n\n    Example 6:\n\n    Question: {question}\n\n    Decompositions:\n    ') List[List[str]][source]

decompose query to little piece of questions. :param queries: List[str], queries to decompose. :param generator_func: Callable, generator functions. :param generator_params: Dict, generator parameters. :param prompt: str, prompt to use for query decomposition.

default prompt comes from Visconde’s StrategyQA few-shot prompt.

Returns:

List[List[str]], list of decomposed query. Return input query if query is not decomposable.

autorag.nodes.queryexpansion.run module

autorag.nodes.queryexpansion.run.evaluate_one_query_expansion_node(retrieval_funcs: List[Callable], retrieval_params: List[Dict], metric_inputs: List[MetricInput], metrics: List[str], project_dir, previous_result: DataFrame, strategy_name: str) DataFrame[source]
autorag.nodes.queryexpansion.run.make_retrieval_callable_params(strategy_dict: Dict)[source]

strategy_dict looks like this:

{
    "metrics": ["retrieval_f1", "retrieval_recall"],
    "top_k": 50,
    "retrieval_modules": [
      {"module_type": "bm25"},
      {"module_type": "vectordb", "embedding_model": ["openai", "huggingface"]}
    ]
  }
autorag.nodes.queryexpansion.run.run_query_expansion_node(modules: List[Callable], module_params: List[Dict], previous_result: DataFrame, node_line_dir: str, strategies: Dict) DataFrame[source]

Run evaluation and select the best module among query expansion node results. Initially, retrieval is run using expanded_queries, the result of the query_expansion module. The retrieval module is run as a combination of the retrieval_modules in strategies. If there are multiple retrieval_modules, run them all and choose the best result. If there are no retrieval_modules, run them with the default of bm25. In this way, the best result is selected for each module, and then the best result is selected.

Parameters:
  • modules – Query expansion modules to run.

  • module_params – Query expansion module parameters.

  • previous_result – Previous result dataframe. In this case, it would be qa data.

  • node_line_dir – This node line’s directory.

  • strategies – Strategies for query expansion node.

Returns:

The best result dataframe.

Module contents