autorag.utils package

Submodules

autorag.utils.preprocess module

autorag.utils.preprocess.cast_corpus_dataset(df: DataFrame)[source]
autorag.utils.preprocess.cast_qa_dataset(df: DataFrame)[source]
autorag.utils.preprocess.validate_corpus_dataset(df: DataFrame)[source]
autorag.utils.preprocess.validate_qa_dataset(df: DataFrame)[source]
autorag.utils.preprocess.validate_qa_from_corpus_dataset(qa_df: DataFrame, corpus_df: DataFrame)[source]

autorag.utils.util module

autorag.utils.util.convert_datetime_string(s)[source]
autorag.utils.util.convert_env_in_dict(d: Dict)[source]

Recursively converts environment variable string in a dictionary to actual environment variable.

Parameters:

d – The dictionary to convert.

Returns:

The converted dictionary.

autorag.utils.util.convert_inputs_to_list(func)[source]

Decorator to convert all function inputs to Python lists.

autorag.utils.util.convert_string_to_tuple_in_dict(d)[source]

Recursively converts strings that start with ‘(’ and end with ‘)’ to tuples in a dictionary.

autorag.utils.util.dict_to_markdown(d, level=1)[source]

Convert a dictionary to a Markdown formatted string.

Parameters:
  • d – Dictionary to convert

  • level – Current level of heading (used for nested dictionaries)

Returns:

Markdown formatted string

autorag.utils.util.dict_to_markdown_table(data, key_column_name: str, value_column_name: str)[source]
autorag.utils.util.embedding_query_content(queries: List[str], contents_list: List[List[str]], embedding_model: str | None = None, batch: int = 128)[source]
autorag.utils.util.explode(index_values: Collection[Any], explode_values: Collection[Collection[Any]])[source]

Explode index_values and explode_values. The index_values and explode_values must have the same length. It will flatten explode_values and keep index_values as a pair.

Parameters:
  • index_values – The index values.

  • explode_values – The exploded values.

Returns:

Tuple of exploded index_values and exploded explode_values.

autorag.utils.util.fetch_contents(corpus_data: DataFrame, ids: List[List[str]], column_name: str = 'contents') List[List[Any]][source]
autorag.utils.util.fetch_one_content(corpus_data: DataFrame, id_: str, column_name: str = 'contents', id_column_name: str = 'doc_id') Any[source]
autorag.utils.util.filter_dict_keys(dict_, keys: List[str])[source]
autorag.utils.util.find_key_values(data, target_key: str) List[Any][source]

Recursively find all values for a specific key in a nested dictionary or list.

Parameters:
  • data – The dictionary or list to search.

  • target_key – The key to search for.

Returns:

A list of values associated with the target key.

autorag.utils.util.find_node_summary_files(trial_dir: str) List[str][source]
autorag.utils.util.find_trial_dir(project_dir: str) List[str][source]
autorag.utils.util.flatten_apply(func: Callable, nested_list: List[List[Any]], **kwargs) List[List[Any]][source]

This function flattens the input list and applies the function to the elements. After that, it reconstructs the list to the original shape. Its speciality is that the first dimension length of the list can be different from each other.

Parameters:
  • func – The function that applies to the flattened list.

  • nested_list – The nested list to be flattened.

Returns:

The list that is reconstructed after applying the function.

autorag.utils.util.get_best_row(summary_df: DataFrame, best_column_name: str = 'is_best') Series[source]

From the summary dataframe, find the best result row by ‘is_best’ column and return it.

Parameters:
  • summary_df – Summary dataframe created by AutoRAG.

  • best_column_name – The column name that indicates the best result. Default is ‘is_best’. You don’t have to change this unless the column name is different.

Returns:

Best row pandas Series instance.

autorag.utils.util.get_event_loop() AbstractEventLoop[source]

Get asyncio event loop safely.

autorag.utils.util.load_summary_file(summary_path: str, dict_columns: List[str] | None = None) DataFrame[source]

Load a summary file from summary_path.

Parameters:
  • summary_path – The path of the summary file.

  • dict_columns – The columns that are dictionary type. You must fill this parameter if you want to load summary file properly. Default is [‘module_params’].

Returns:

The summary dataframe.

autorag.utils.util.make_batch(elems: List[Any], batch_size: int) List[List[Any]][source]

Make a batch of elems with batch_size.

autorag.utils.util.make_combinations(target_dict: Dict[str, Any]) List[Dict[str, Any]][source]

Make combinations from target_dict. The target_dict key value must be a string, and the value can be a list of values or single value. If generates all combinations of values from target_dict, which means generating dictionaries that contain only one value for each key, and all dictionaries will be different from each other.

Parameters:

target_dict – The target dictionary.

Returns:

The list of generated dictionaries.

autorag.utils.util.normalize_string(s: str) str[source]

Taken from the official evaluation script for v1.1 of the SQuAD dataset. Lower text and remove punctuation, articles, and extra whitespace.

autorag.utils.util.normalize_unicode(text: str) str[source]
autorag.utils.util.openai_truncate_by_token(texts: List[str], token_limit: int, model_name: str) List[str][source]
async autorag.utils.util.process_batch(tasks, batch_size: int = 64) List[Any][source]

Processes tasks in batches asynchronously.

Parameters:
  • tasks – A list of no-argument functions or coroutines to be executed.

  • batch_size – The number of tasks to process in a single batch. Default is 64.

Returns:

A list of results from the processed tasks.

autorag.utils.util.reconstruct_list(flat_list: List[Any], lengths: List[int]) List[List[Any]][source]
autorag.utils.util.replace_value_in_dict(target_dict: Dict, key: str, replace_value: Any) Dict[source]

Replace the value of a certain key in target_dict. If there is no targeted key in target_dict, it will return target_dict.

Parameters:
  • target_dict – The target dictionary.

  • key – The key is to replace.

  • replace_value – The value to replace.

Returns:

The replaced dictionary.

autorag.utils.util.result_to_dataframe(column_names: List[str])[source]

Decorator for converting results to pd.DataFrame.

autorag.utils.util.save_parquet_safe(df: DataFrame, filepath: str, upsert: bool = False)[source]
autorag.utils.util.select_top_k(df, column_names: List[str], top_k: int)[source]
autorag.utils.util.sort_by_scores(row, reverse=True)[source]

Sorts each row by ‘scores’ column. The input column names must be ‘contents’, ‘ids’, and ‘scores’. And its elements must be list type.

autorag.utils.util.split_dataframe(df, chunk_size)[source]
autorag.utils.util.to_list(item)[source]

Recursively convert collections to Python lists.

Module contents