autorag.data.beta.filter package¶
Submodules¶
autorag.data.beta.filter.dontknow module¶
- class autorag.data.beta.filter.dontknow.Response(*, is_dont_know: bool)[source]¶
Bases:
BaseModel
- is_dont_know: bool¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'is_dont_know': FieldInfo(annotation=bool, required=True)}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- async autorag.data.beta.filter.dontknow.dontknow_filter_llama_index(row: Dict, llm: BaseLLM, lang: str = 'en') bool [source]¶
This will drop rows that have a “don’t know” answer. It will drop unanswerable questions from the QA dataset. You can use this filter with the ` batch_filter ` function at QA class.
- Parameters:
row – The row dict from QA dataset.
llm – The Llama index llm instance. It will be good if you set max tokens to low for saving tokens.
lang – The supported language is en or ko.
- Returns:
False if the row generation_gt is a “don’t know” meaning.
- async autorag.data.beta.filter.dontknow.dontknow_filter_openai(row: Dict, client: AsyncOpenAI, model_name: str = 'gpt-4o-mini-2024-07-18', lang: str = 'en') bool [source]¶
This will drop rows that have a “don’t know” answer. It will drop unanswerable questions from the QA dataset. You can use this filter with the ` batch_filter ` function at QA class.
- Parameters:
row – The row dict from QA dataset.
client – The OpenAI client.
model_name – The model name. You have to use gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18.
lang – The supported language is en or ko.
- Returns:
False if the row generation_gt is a “don’t know” meaning.