autorag.data.parse package

Submodules

autorag.data.parse.base module

autorag.data.parse.base.parser_node(func)[source]

autorag.data.parse.clova module

autorag.data.parse.langchain_parse module

autorag.data.parse.langchain_parse.langchain_parse(data_path_list: List[str], parse_method: str, **kwargs) Tuple[List[str], List[str], List[int]][source]

Parse documents to use langchain document_loaders(parse) method

Parameters:
  • data_path_list – The list of data paths to parse.

  • parse_method – A langchain document_loaders(parse) method to use.

  • kwargs – The extra parameters for creating the langchain document_loaders(parse) instance.

Returns:

tuple of lists containing the parsed texts, path and pages.

autorag.data.parse.langchain_parse.langchain_parse_pure(data_path: str, parse_method: str, kwargs) Tuple[List[str], List[str], List[int]][source]

Parses a single file using the specified parse method.

Args:

data_path (str): The file path to parse. parse_method (str): The parsing method to use. kwargs (Dict): Additional keyword arguments for the parsing method.

Returns:

Tuple[str, str]: A tuple containing the parsed text and the file path.

autorag.data.parse.langchain_parse.parse_all_files(data_path_list: List[str], parse_method: str, **kwargs) Tuple[List[str], List[str]][source]

autorag.data.parse.llamaparse module

autorag.data.parse.llamaparse.llama_parse(data_path_list: List[str], batch: int = 8, use_vendor_multimodal_model: bool = False, vendor_multimodal_model_name: str = 'openai-gpt4o', use_own_key: bool = False, vendor_multimodal_api_key: str = None, **kwargs) Tuple[List[str], List[str], List[int]][source]

Parse documents to use llama_parse. LLAMA_CLOUD_API_KEY environment variable should be set. You can get the key from https://cloud.llamaindex.ai/api-key

Parameters:
  • data_path_list – The list of data paths to parse.

  • batch – The batch size for parse documents. Default is 8.

  • use_vendor_multimodal_model – Whether to use the vendor multimodal model. Default is False.

  • vendor_multimodal_model_name – The name of the vendor multimodal model. Default is “openai-gpt4o”.

  • use_own_key – Whether to use the own API key. Default is False.

  • vendor_multimodal_api_key – The API key for the vendor multimodal model.

  • kwargs – The extra parameters for creating the llama_parse instance.

Returns:

tuple of lists containing the parsed texts, path and pages.

async autorag.data.parse.llamaparse.llama_parse_pure(data_path: str, parse_instance) Tuple[List[str], List[str], List[int]][source]

autorag.data.parse.run module

autorag.data.parse.run.run_parser(modules: List[Callable], module_params: List[Dict], data_path_glob: str, project_dir: str)[source]

autorag.data.parse.table_hybrid_parse module

Module contents