autorag.data.parse package¶
Submodules¶
autorag.data.parse.base module¶
autorag.data.parse.clova module¶
autorag.data.parse.langchain_parse module¶
- autorag.data.parse.langchain_parse.langchain_parse(data_path_list: List[str], parse_method: str, **kwargs) Tuple[List[str], List[str], List[int]] [source]¶
Parse documents to use langchain document_loaders(parse) method
- Parameters:
data_path_list – The list of data paths to parse.
parse_method – A langchain document_loaders(parse) method to use.
kwargs – The extra parameters for creating the langchain document_loaders(parse) instance.
- Returns:
tuple of lists containing the parsed texts, path and pages.
- autorag.data.parse.langchain_parse.langchain_parse_pure(data_path: str, parse_method: str, kwargs) Tuple[List[str], List[str], List[int]] [source]¶
Parses a single file using the specified parse method.
- Args:
data_path (str): The file path to parse. parse_method (str): The parsing method to use. kwargs (Dict): Additional keyword arguments for the parsing method.
- Returns:
Tuple[str, str]: A tuple containing the parsed text and the file path.
autorag.data.parse.llamaparse module¶
- autorag.data.parse.llamaparse.llama_parse(data_path_list: List[str], batch: int = 8, use_vendor_multimodal_model: bool = False, vendor_multimodal_model_name: str = 'openai-gpt4o', use_own_key: bool = False, vendor_multimodal_api_key: str = None, **kwargs) Tuple[List[str], List[str], List[int]] [source]¶
Parse documents to use llama_parse. LLAMA_CLOUD_API_KEY environment variable should be set. You can get the key from https://cloud.llamaindex.ai/api-key
- Parameters:
data_path_list – The list of data paths to parse.
batch – The batch size for parse documents. Default is 8.
use_vendor_multimodal_model – Whether to use the vendor multimodal model. Default is False.
vendor_multimodal_model_name – The name of the vendor multimodal model. Default is “openai-gpt4o”.
use_own_key – Whether to use the own API key. Default is False.
vendor_multimodal_api_key – The API key for the vendor multimodal model.
kwargs – The extra parameters for creating the llama_parse instance.
- Returns:
tuple of lists containing the parsed texts, path and pages.