Langchain Chunk¶
Chunk parsed results to use langchain text splitters.
Available Chunk Method¶
1. Token¶
2. Character¶
3. Sentence¶
konlpy: For Korean 🇰🇷
Example YAML¶
modules:
- module_type: langchain_chunk
parse_method: konlpy
add_file_name: korean
Using Langchain Chunk Method that is not in the Available Chunk Method¶
You can find more information about the langchain chunk method at here
How to Use¶
If you want to use PythonCodeTextSplitter
that is not in the available chunk method, you can use the following code.
from autorag.data import chunk_modules
from langchain.text_splitter import PythonCodeTextSplitter
chunk_modules["python"] = PythonCodeTextSplitter
Attention
The key value in chunk_modules must always be written in lowercase.