autorag.data.beta.filter package¶

Submodules¶

autorag.data.beta.filter.dontknow module¶

class autorag.data.beta.filter.dontknow.Response(*, is_dont_know: bool)[source]¶

Bases: BaseModel

is_dont_know: bool¶

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'is_dont_know': FieldInfo(annotation=bool, required=True)}¶

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

async autorag.data.beta.filter.dontknow.dontknow_filter_llama_index(row: Dict, llm: BaseLLM, lang: str = 'en') → bool[source]¶

This will drop rows that have a “don’t know” answer. It will drop unanswerable questions from the QA dataset. You can use this filter with the ` batch_filter ` function at QA class.

Parameters:

row – The row dict from QA dataset.
llm – The Llama index llm instance. It will be good if you set max tokens to low for saving tokens.
lang – The supported language is en or ko.

Returns:

False if the row generation_gt is a “don’t know” meaning.

async autorag.data.beta.filter.dontknow.dontknow_filter_openai(row: Dict, client: AsyncOpenAI, model_name: str = 'gpt-4o-mini-2024-07-18', lang: str = 'en') → bool[source]¶