autorag.utils package¶

Submodules¶

autorag.utils.preprocess module¶

autorag.utils.preprocess.cast_corpus_dataset(df: DataFrame)[source]¶

autorag.utils.preprocess.cast_qa_dataset(df: DataFrame)[source]¶

autorag.utils.preprocess.validate_corpus_dataset(df: DataFrame)[source]¶

autorag.utils.preprocess.validate_qa_dataset(df: DataFrame)[source]¶

autorag.utils.preprocess.validate_qa_from_corpus_dataset(qa_df: DataFrame, corpus_df: DataFrame)[source]¶

autorag.utils.util module¶

async autorag.utils.util.aflatten_apply(func: Callable, nested_list: List[List[Any]], **kwargs) → List[List[Any]][source]¶

This function flattens the input list and applies the function to the elements. After that, it reconstructs the list to the original shape. Its speciality is that the first dimension length of the list can be different from each other.

Parameters:

func – The function that applies to the flattened list.
nested_list – The nested list to be flattened.

Returns:

The list that is reconstructed after applying the function.

autorag.utils.util.apply_recursive(func, data)[source]¶

Recursively apply a function to all elements in a list, tuple, set, np.ndarray, or pd.Series and return as List.

Parameters:

func – Function to apply to each element.
data – List or nested list.

Returns:

List with the function applied to each element.

autorag.utils.util.convert_datetime_string(s)[source]¶

autorag.utils.util.convert_env_in_dict(d: Dict)[source]¶

Recursively converts environment variable string in a dictionary to actual environment variable.

Parameters:: d – The dictionary to convert.
Returns:: The converted dictionary.

autorag.utils.util.convert_inputs_to_list(func)[source]¶: Decorator to convert all function inputs to Python lists.

autorag.utils.util.convert_string_to_tuple_in_dict(d)[source]¶: Recursively converts strings that start with ‘(’ and end with ‘)’ to tuples in a dictionary.

autorag.utils.util.decode_multiple_json_from_bytes(byte_data: bytes) → list[source]¶

Decode multiple JSON objects from bytes received from SSE server.

Args:: byte_data: Bytes containing one or more JSON objects
Returns:: List of decoded JSON objects

autorag.utils.util.demojize(text: str) → str[source]¶

autorag.utils.util.dict_to_markdown(d, level=1)[source]¶

Convert a dictionary to a Markdown formatted string.

Parameters:

d – Dictionary to convert
level – Current level of heading (used for nested dictionaries)

Returns:

Markdown formatted string

autorag.utils.util.dict_to_markdown_table(data, key_column_name: str, value_column_name: str)[source]¶

autorag.utils.util.embedding_query_content(queries: List[str], contents_list: List[List[str]], embedding_model: str | None = None, batch: int = 128)[source]¶

autorag.utils.util.empty_cuda_cache()[source]¶

autorag.utils.util.explode(index_values: Collection[Any], explode_values: Collection[Collection[Any]])[source]¶

Explode index_values and explode_values. The index_values and explode_values must have the same length. It will flatten explode_values and keep index_values as a pair.

Parameters:

index_values – The index values.
explode_values – The exploded values.

Returns:

Tuple of exploded index_values and exploded explode_values.

autorag.utils.util.fetch_contents(corpus_data: DataFrame, ids: List[List[str]], column_name: str = 'contents') → List[List[Any]][source]¶

autorag.utils.util.fetch_one_content(corpus_data: DataFrame, id_: str, column_name: str = 'contents', id_column_name: str = 'doc_id') → Any[source]¶

autorag.utils.util.filter_dict_keys(dict_, keys: List[str])[source]¶

autorag.utils.util.find_key_values(data, target_key: str) → List[Any][source]¶

Recursively find all values for a specific key in a nested dictionary or list.

Parameters:

data – The dictionary or list to search.
target_key – The key to search for.

Returns:

A list of values associated with the target key.

autorag.utils.util.find_node_summary_files(trial_dir: str) → List[str][source]¶

autorag.utils.util.find_trial_dir(project_dir: str) → List[str][source]¶

autorag.utils.util.flatten_apply(func: Callable, nested_list: List[List[Any]], **kwargs) → List[List[Any]][source]¶

Parameters:

func – The function that applies to the flattened list.
nested_list – The nested list to be flattened.

Returns:

The list that is reconstructed after applying the function.

autorag.utils.util.get_best_row(summary_df: DataFrame, best_column_name: str = 'is_best') → Series[source]¶

From the summary dataframe, find the best result row by ‘is_best’ column and return it.

Parameters:

summary_df – Summary dataframe created by AutoRAG.
best_column_name – The column name that indicates the best result. Default is ‘is_best’. You don’t have to change this unless the column name is different.

Returns:

Best row pandas Series instance.

autorag.utils.util.get_event_loop() → AbstractEventLoop[source]¶: Get asyncio event loop safely.

autorag.utils.util.load_summary_file(summary_path: str, dict_columns: List[str] | None = None) → DataFrame[source]¶

Load a summary file from summary_path.

Parameters:

summary_path – The path of the summary file.
dict_columns – The columns that are dictionary type. You must fill this parameter if you want to load summary file properly. Default is [‘module_params’].

Returns:

The summary dataframe.

autorag.utils.util.load_yaml_config(yaml_path: str) → Dict[source]¶

Load a YAML configuration file for AutoRAG. It contains safe loading, converting string to tuple, and insert environment variables.

Parameters:: yaml_path – The path of the YAML configuration file.
Returns:: The loaded configuration dictionary.

autorag.utils.util.make_batch(elems: List[Any], batch_size: int) → List[List[Any]][source]¶: Make a batch of elems with batch_size.

autorag.utils.util.make_combinations(target_dict: Dict[str, Any]) → List[Dict[str, Any]][source]¶

Make combinations from target_dict. The target_dict key value must be a string, and the value can be a list of values or single value. If generates all combinations of values from target_dict, which means generating dictionaries that contain only one value for each key, and all dictionaries will be different from each other.

Parameters:: target_dict – The target dictionary.
Returns:: The list of generated dictionaries.

autorag.utils.util.normalize_string(s: str) → str[source]¶: Taken from the official evaluation script for v1.1 of the SQuAD dataset. Lower text and remove punctuation, articles, and extra whitespace.

autorag.utils.util.normalize_unicode(text: str) → str[source]¶

autorag.utils.util.openai_truncate_by_token(texts: List[str], token_limit: int, model_name: str) → List[str][source]¶

autorag.utils.util.pop_params(func: Callable, kwargs: Dict) → Dict[source]¶

Pop parameters from the given func and return them. It automatically deletes the parameters like “self” or “cls”.

Parameters:

func – The function to pop parameters.
kwargs – kwargs to pop parameters.

Returns:

The popped parameters.

autorag.utils.util.preprocess_text(text: str) → str[source]¶

async autorag.utils.util.process_batch(tasks, batch_size: int = 64) → List[Any][source]¶

Processes tasks in batches asynchronously.

Parameters:

tasks – A list of no-argument functions or coroutines to be executed.
batch_size – The number of tasks to process in a single batch. Default is 64.

Returns:

A list of results from the processed tasks.

autorag.utils.util.reconstruct_list(flat_list: List[Any], lengths: List[int]) → List[List[Any]][source]¶

autorag.utils.util.replace_value_in_dict(target_dict: Dict, key: str, replace_value: Any) → Dict[source]¶

Replace the value of a certain key in target_dict. If there is no targeted key in target_dict, it will return target_dict.

Parameters:

target_dict – The target dictionary.
key – The key is to replace.
replace_value – The value to replace.

Returns:

The replaced dictionary.

autorag.utils.util.result_to_dataframe(column_names: List[str])[source]¶: Decorator for converting results to pd.DataFrame.

autorag.utils.util.save_parquet_safe(df: DataFrame, filepath: str, upsert: bool = False)[source]¶

autorag.utils.util.select_top_k(df, column_names: List[str], top_k: int)[source]¶

autorag.utils.util.sort_by_scores(row, reverse=True)[source]¶: Sorts each row by ‘scores’ column. The input column names must be ‘contents’, ‘ids’, and ‘scores’. And its elements must be list type.

autorag.utils.util.split_dataframe(df, chunk_size)[source]¶

autorag.utils.util.to_list(item)[source]¶: Recursively convert collections to Python lists.

autorag.utils package¶

Submodules¶

autorag.utils.preprocess module¶

autorag.utils.util module¶

Module contents¶