autorag package¶
Subpackages¶
- autorag.data package
- autorag.evaluation package
- autorag.nodes package
- Subpackages
- autorag.nodes.generator package
- autorag.nodes.passageaugmenter package
- autorag.nodes.passagecompressor package
- Submodules
- autorag.nodes.passagecompressor.base module
- autorag.nodes.passagecompressor.longllmlingua module
- autorag.nodes.passagecompressor.pass_compressor module
- autorag.nodes.passagecompressor.refine module
- autorag.nodes.passagecompressor.run module
- autorag.nodes.passagecompressor.tree_summarize module
- Module contents
- autorag.nodes.passagefilter package
- Submodules
- autorag.nodes.passagefilter.base module
- autorag.nodes.passagefilter.pass_passage_filter module
- autorag.nodes.passagefilter.percentile_cutoff module
- autorag.nodes.passagefilter.recency module
- autorag.nodes.passagefilter.run module
- autorag.nodes.passagefilter.similarity_percentile_cutoff module
- autorag.nodes.passagefilter.similarity_threshold_cutoff module
- autorag.nodes.passagefilter.threshold_cutoff module
- Module contents
- autorag.nodes.passagereranker package
- Subpackages
- Submodules
- autorag.nodes.passagereranker.base module
- autorag.nodes.passagereranker.cohere module
- autorag.nodes.passagereranker.colbert module
- autorag.nodes.passagereranker.flag_embedding module
- autorag.nodes.passagereranker.flag_embedding_llm module
- autorag.nodes.passagereranker.jina module
- autorag.nodes.passagereranker.koreranker module
- autorag.nodes.passagereranker.monot5 module
- autorag.nodes.passagereranker.pass_reranker module
- autorag.nodes.passagereranker.rankgpt module
- autorag.nodes.passagereranker.run module
- autorag.nodes.passagereranker.sentence_transformer module
- autorag.nodes.passagereranker.time_reranker module
- autorag.nodes.passagereranker.upr module
- Module contents
- autorag.nodes.promptmaker package
- autorag.nodes.queryexpansion package
- Submodules
- autorag.nodes.queryexpansion.base module
- autorag.nodes.queryexpansion.hyde module
- autorag.nodes.queryexpansion.multi_query_expansion module
- autorag.nodes.queryexpansion.pass_query_expansion module
- autorag.nodes.queryexpansion.query_decompose module
- autorag.nodes.queryexpansion.run module
- Module contents
- autorag.nodes.retrieval package
- Module contents
- Subpackages
- autorag.schema package
- Submodules
- autorag.schema.metricinput module
MetricInput
MetricInput.from_dataframe()
MetricInput.generated_log_probs
MetricInput.generated_texts
MetricInput.generation_gt
MetricInput.is_fields_notnone()
MetricInput.prompt
MetricInput.queries
MetricInput.query
MetricInput.retrieval_gt
MetricInput.retrieval_gt_contents
MetricInput.retrieved_contents
MetricInput.retrieved_ids
- autorag.schema.module module
- autorag.schema.node module
- Module contents
- autorag.utils package
- Submodules
- autorag.utils.preprocess module
- autorag.utils.util module
convert_datetime_string()
convert_env_in_dict()
convert_inputs_to_list()
convert_string_to_tuple_in_dict()
dict_to_markdown()
dict_to_markdown_table()
embedding_query_content()
explode()
fetch_contents()
fetch_one_content()
filter_dict_keys()
find_key_values()
find_node_summary_files()
find_trial_dir()
flatten_apply()
get_best_row()
get_event_loop()
load_summary_file()
make_batch()
make_combinations()
normalize_string()
normalize_unicode()
openai_truncate_by_token()
process_batch()
reconstruct_list()
replace_value_in_dict()
result_to_dataframe()
save_parquet_safe()
select_top_k()
sort_by_scores()
split_dataframe()
to_list()
- Module contents
Submodules¶
autorag.chunker module¶
autorag.cli module¶
autorag.dashboard module¶
autorag.deploy module¶
- class autorag.deploy.Runner(config: Dict, project_dir: str | None = None)[source]¶
Bases:
object
- classmethod from_trial_folder(trial_path: str)[source]¶
Load Runner from evaluated trial folder. Must already be evaluated using Evaluator class. It sets the project_dir as the parent directory of the trial folder.
- Parameters:
trial_path – The path of the trial folder.
- Returns:
Initialized Runner.
- classmethod from_yaml(yaml_path: str, project_dir: str | None = None)[source]¶
Load Runner from yaml file. Must be extracted yaml file from evaluated trial using extract_best_config method.
- Parameters:
yaml_path – The path of the yaml file.
project_dir – The path of the project directory. Default is the current directory.
- Returns:
Initialized Runner.
- run(query: str, result_column: str = 'generated_texts')[source]¶
Run the pipeline with query. The loaded pipeline must start with a single query, so the first module of the pipeline must be query_expansion or retrieval module.
- Parameters:
query – The query of the user.
result_column – The result column name for the answer. Default is generated_texts, which is the output of the generation module.
- Returns:
The result of the pipeline.
- run_api_server(host: str = '0.0.0.0', port: int = 8000, **kwargs)[source]¶
Run the pipeline as api server. You can send POST request to http://host:port/run with json body like below:
{ "query": "your query", "result_column": "generated_texts" }
And it returns json response like below:
{ "answer": "your answer" }
- Parameters:
host – The host of the api server.
port – The port of the api server.
kwargs – Other arguments for Flask app.run.
- run_web(server_name: str = '0.0.0.0', server_port: int = 7680, share: bool = False, **kwargs)[source]¶
Run web interface to interact pipeline. You can access the web interface at http://server_name:server_port in your browser
- Parameters:
server_name – The host of the web. Default is 0.0.0.0.
server_port – The port of the web. Default is 7680.
share – Whether to create a publicly shareable link. Default is False.
kwargs – Other arguments for gr.ChatInterface.launch.
- class autorag.deploy.RunnerInput(*, query: str, result_column: str = 'generated_texts')[source]¶
Bases:
BaseModel
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'query': FieldInfo(annotation=str, required=True), 'result_column': FieldInfo(annotation=str, required=False, default='generated_texts')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- query: str¶
- result_column: str¶
- autorag.deploy.extract_best_config(trial_path: str, output_path: str | None = None) Dict [source]¶
Extract the optimal pipeline from evaluated trial.
- Parameters:
trial_path – The path to the trial directory that you want to extract the pipeline from. Must already be evaluated.
output_path – Output path that pipeline yaml file will be saved. Must be .yaml or .yml file. If None, it does not save yaml file and just return dict values. Default is None.
- Returns:
The dictionary of the extracted pipeline.
- autorag.deploy.extract_node_line_names(config_dict: Dict) List[str] [source]¶
Extract node line names with the given config dictionary order.
- Parameters:
config_dict – The yaml configuration dict for the pipeline. You can load this to access trail_folder/config.yaml.
- Returns:
The list of node line names. It is the order of the node line names in the pipeline.
- autorag.deploy.extract_node_strategy(config_dict: Dict) Dict [source]¶
Extract node strategies with the given config dictionary. The return value is a dictionary of node type and its strategy.
- Parameters:
config_dict – The yaml configuration dict for the pipeline. You can load this to access trail_folder/config.yaml.
- Returns:
Key is node_type and value is strategy dict.
- autorag.deploy.summary_df_to_yaml(summary_df: DataFrame, config_dict: Dict) Dict [source]¶
Convert trial summary dataframe to config yaml file.
- Parameters:
summary_df – The trial summary dataframe of the evaluated trial.
config_dict – The yaml configuration dict for the pipeline. You can load this to access trail_folder/config.yaml.
- Returns:
Dictionary of config yaml file. You can save this dictionary to yaml file.
autorag.evaluator module¶
autorag.node_line module¶
- autorag.node_line.make_node_lines(node_line_dict: Dict) List[Node] [source]¶
This method makes a list of nodes from node line dictionary. :param node_line_dict: Node_line_dict loaded from yaml file, or get from user input. :return: List of Nodes inside this node line.
- autorag.node_line.run_node_line(nodes: List[Node], node_line_dir: str, previous_result: DataFrame | None = None)[source]¶
Run the whole node line by running each node.
- Parameters:
nodes – A list of nodes.
node_line_dir – This node line’s directory.
previous_result – A result of the previous node line. If None, it loads qa data from data/qa.parquet.
- Returns:
The final result of the node line.
autorag.parser module¶
autorag.strategy module¶
- autorag.strategy.avoid_empty_result(return_index: List[int])[source]¶
Decorator for avoiding empty results from the function. When the func returns an empty result, it will return the origin results. When the func returns a None, it will return the origin results. When the return value is a tuple, it will check all the value or list is empty. If so, it will return the origin results. It keeps parameters at return_index of the function as the origin results.
- Parameters:
return_index – The index of the result to be returned when there is no result.
- Returns:
The origin results or the results from the function.
- autorag.strategy.filter_by_threshold(results, value, threshold, metadatas=None) Tuple[List, List] [source]¶
Filter results by value’s threshold.
- Parameters:
results – The result list to be filtered.
value – The value list to be filtered. It must have the same length with results.
threshold – The threshold value.
metadatas – The metadata of each result.
- Returns:
Filtered list of results and filtered list of metadatas. Metadatas will be returned even if you did not give input metadatas.
- Return type:
Tuple[List, List]
- autorag.strategy.measure_speed(func, *args, **kwargs)[source]¶
Method for measuring execution speed of the function.
- autorag.strategy.select_best(results: List[DataFrame], columns: Iterable[str], metadatas: List[Any] | None = None, strategy_name: str = 'mean') Tuple[DataFrame, Any] [source]¶
- autorag.strategy.select_best_average(results: List[DataFrame], columns: Iterable[str], metadatas: List[Any] | None = None) Tuple[DataFrame, Any] [source]¶
Select the best result by average value among given columns.
- Parameters:
results – The list of results. Each result must be pd.DataFrame.
columns – Column names to be averaged. Standard to select the best result.
metadatas – The metadata of each result. It will select one metadata with the best result.
- Returns:
The best result and the best metadata. The metadata will be returned even if you did not give input ‘metadatas’ parameter.
- Return type:
Tuple[pd.DataFrame, Any]
- autorag.strategy.select_best_rr(results: List[DataFrame], columns: Iterable[str], metadatas: List[Any] | None = None) Tuple[DataFrame, Any] [source]¶
autorag.support module¶
autorag.validator module¶
autorag.web module¶
Module contents¶
- class autorag.MockEmbeddingRandom(embed_dim: int, *, model_name: str = 'unknown', embed_batch_size: Annotated[int, Gt(gt=0), Le(le=2048)] = 10, callback_manager: CallbackManager = None, num_workers: int | None = None)[source]¶
Bases:
MockEmbedding
Mock embedding with random vectors.
- embed_dim: int¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'protected_namespaces': ('pydantic_model_',)}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'callback_manager': FieldInfo(annotation=CallbackManager, required=False, default_factory=<lambda>, exclude=True), 'embed_batch_size': FieldInfo(annotation=int, required=False, default=10, description='The batch size for embedding calls.', metadata=[Gt(gt=0), Le(le=2048)]), 'embed_dim': FieldInfo(annotation=int, required=True), 'model_name': FieldInfo(annotation=str, required=False, default='unknown', description='The name of the embedding model.'), 'num_workers': FieldInfo(annotation=Union[int, NoneType], required=False, default=None, description='The number of workers to use for async embedding calls.')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- autorag.random() x in the interval [0, 1). ¶