autorag.data.qa.filter package

Submodules

autorag.data.qa.filter.dontknow module

class autorag.data.qa.filter.dontknow.Response(*, is_dont_know: bool)[source]

Bases: BaseModel

is_dont_know: bool
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'is_dont_know': FieldInfo(annotation=bool, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

async autorag.data.qa.filter.dontknow.dontknow_filter_llama_index(row: Dict, llm: BaseLLM, lang: str = 'en') bool[source]

This will drop rows that have a “don’t know” answer. It will drop unanswerable questions from the QA dataset. You can use this filter with the ` batch_filter ` function at QA class.

Parameters:
  • row – The row dict from QA dataset.

  • llm – The Llama index llm instance. It will be good if you set max tokens to low for saving tokens.

  • lang – The supported language is en, ko or ja.

Returns:

False if the row generation_gt is a “don’t know” meaning.

async autorag.data.qa.filter.dontknow.dontknow_filter_openai(row: Dict, client: AsyncOpenAI, model_name: str = 'gpt-4o-mini-2024-07-18', lang: str = 'en') bool[source]

This will drop rows that have a “don’t know” answer. It will drop unanswerable questions from the QA dataset. You can use this filter with the ` batch_filter ` function at QA class.

Parameters:
  • row – The row dict from QA dataset.

  • client – The OpenAI client.

  • model_name – The model name. You have to use gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18.

  • lang – The supported language is en, ko or ja.

Returns:

False if the row generation_gt is a “don’t know” meaning.

autorag.data.qa.filter.dontknow.dontknow_filter_rule_based(row: Dict, lang: str = 'en') bool[source]

autorag.data.qa.filter.passage_dependency module

class autorag.data.qa.filter.passage_dependency.Response(*, is_passage_dependent: bool)[source]

Bases: BaseModel

is_passage_dependent: bool
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'is_passage_dependent': FieldInfo(annotation=bool, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

async autorag.data.qa.filter.passage_dependency.passage_dependency_filter_llama_index(row: Dict, llm: BaseLLM, lang: str = 'en') bool[source]

This will drop passage-dependent question rows. Passage-dependent questions are questions that the answer will change depending on what passage you choose. The passage-dependent questions will not be good for RAG evaluation, because any retrieval system can’t find the right passage with passage-dependent question. For example, when someone asks “What is the highest score according to the table?” the answer will be different depending on the table. And what is the table? The retrieval system can’t find the right passage with this question. You can use this filter with the ` batch_filter ` function at QA class.

Parameters:
  • row – The row dict from QA dataset.

  • llm

    The Llama index llm instance.

    It will be good if you set max tokens to low for saving tokens.

    param lang:

    The supported language is en, ko or ja.

Returns:

False if the row question is a passage-dependent question (to be filtered).

async autorag.data.qa.filter.passage_dependency.passage_dependency_filter_openai(row: Dict, client: AsyncOpenAI, model_name: str = 'gpt-4o-mini-2024-07-18', lang: str = 'en') bool[source]

This will drop passage-dependent question rows. Passage-dependent questions are questions that the answer will change depending on what passage you choose. The passage-dependent questions will not be good for RAG evaluation, because any retrieval system can’t find the right passage with passage-dependent question. For example, when someone asks “What is the highest score according to the table?” the answer will be different depending on the table. And what is the table? The retrieval system can’t find the right passage with this question. You can use this filter with the ` batch_filter ` function at QA class.

Parameters:

row

The row dict from QA dataset. :param client: The OpenAI client. :param model_name: The model name.

You have to use gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18.

param lang:

The supported language is en, ko or ja.

Returns:

False if the row question is a passage-dependent question (to be filtered).

autorag.data.qa.filter.prompt module

Module contents