autorag.data.qa.filter package¶

Submodules¶

autorag.data.qa.filter.dontknow module¶

class autorag.data.qa.filter.dontknow.Response(*, is_dont_know: bool)[source]¶

Bases: BaseModel

is_dont_know: bool¶

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'is_dont_know': FieldInfo(annotation=bool, required=True)}¶

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

async autorag.data.qa.filter.dontknow.dontknow_filter_llama_index(row: Dict, llm: BaseLLM, lang: str = 'en') → bool[source]¶

This will drop rows that have a “don’t know” answer. It will drop unanswerable questions from the QA dataset. You can use this filter with the ` batch_filter ` function at QA class.

Parameters:

row – The row dict from QA dataset.
llm – The Llama index llm instance. It will be good if you set max tokens to low for saving tokens.
lang – The supported language is en, ko or ja.

Returns:

False if the row generation_gt is a “don’t know” meaning.

async autorag.data.qa.filter.dontknow.dontknow_filter_openai(row: Dict, client: AsyncOpenAI, model_name: str = 'gpt-4o-mini-2024-07-18', lang: str = 'en') → bool[source]¶

This will drop rows that have a “don’t know” answer. It will drop unanswerable questions from the QA dataset. You can use this filter with the ` batch_filter ` function at QA class.

Parameters:

row – The row dict from QA dataset.
client – The OpenAI client.
model_name – The model name. You have to use gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18.
lang – The supported language is en, ko or ja.

Returns:

False if the row generation_gt is a “don’t know” meaning.

autorag.data.qa.filter.dontknow.dontknow_filter_rule_based(row: Dict, lang: str = 'en') → bool[source]¶

autorag.data.qa.filter.passage_dependency module¶

class autorag.data.qa.filter.passage_dependency.Response(*, is_passage_dependent: bool)[source]¶

Bases: BaseModel

is_passage_dependent: bool¶

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'is_passage_dependent': FieldInfo(annotation=bool, required=True)}¶

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

async autorag.data.qa.filter.passage_dependency.passage_dependency_filter_llama_index(row: Dict, llm: BaseLLM, lang: str = 'en') → bool[source]¶

This will drop passage-dependent question rows. Passage-dependent questions are questions that the answer will change depending on what passage you choose. The passage-dependent questions will not be good for RAG evaluation, because any retrieval system can’t find the right passage with passage-dependent question. For example, when someone asks “What is the highest score according to the table?” the answer will be different depending on the table. And what is the table? The retrieval system can’t find the right passage with this question. You can use this filter with the ` batch_filter ` function at QA class.

Parameters:

row – The row dict from QA dataset.
llm –

The Llama index llm instance.
It will be good if you set max tokens to low for saving tokens.

param lang:

The supported language is en, ko or ja.

Returns:

False if the row question is a passage-dependent question (to be filtered).

async autorag.data.qa.filter.passage_dependency.passage_dependency_filter_openai(row: Dict, client: AsyncOpenAI, model_name: str = 'gpt-4o-mini-2024-07-18', lang: str = 'en') → bool[source]¶

Parameters:

row –

The row dict from QA dataset. :param client: The OpenAI client. :param model_name: The model name.

You have to use gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18.

param lang:: The supported language is en, ko or ja.

Returns:

False if the row question is a passage-dependent question (to be filtered).

autorag.data.qa.filter package¶

Submodules¶

autorag.data.qa.filter.dontknow module¶

autorag.data.qa.filter.passage_dependency module¶

autorag.data.qa.filter.prompt module¶

Module contents¶