autorag.nodes.passagereranker package

Subpackages

Submodules

autorag.nodes.passagereranker.base module

class autorag.nodes.passagereranker.base.BasePassageReranker(project_dir: str | Path, *args, **kwargs)[source]

Bases: BaseModule

cast_to_run(previous_result: DataFrame, *args, **kwargs)[source]

This function is for cast function (a.k.a decorator) only for pure function in the whole node.

autorag.nodes.passagereranker.cohere module

class autorag.nodes.passagereranker.cohere.CohereReranker(project_dir: str, *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]
async autorag.nodes.passagereranker.cohere.cohere_rerank_pure(cohere_client: AsyncClient, model: str, query: str, documents: List[str], ids: List[str], top_k: int) Tuple[List[str], List[str], List[float]][source]

Rerank a list of contents with Cohere rerank models.

Parameters:
  • cohere_client – The Cohere AsyncClient to use for reranking

  • model – The model name for Cohere rerank

  • query – The query to use for reranking

  • documents – The list of contents to rerank

  • ids – The list of ids corresponding to the documents

  • top_k – The number of passages to be retrieved

Returns:

Tuple of lists containing the reranked contents, ids, and scores

autorag.nodes.passagereranker.colbert module

class autorag.nodes.passagereranker.colbert.ColbertReranker(project_dir: str, model_name: str = 'colbert-ir/colbertv2.0', *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]
autorag.nodes.passagereranker.colbert.get_colbert_embedding_batch(input_strings: List[str], model, tokenizer, batch_size: int) List[array][source]
autorag.nodes.passagereranker.colbert.get_colbert_score(query_embedding: array, content_embedding: array) float[source]
autorag.nodes.passagereranker.colbert.slice_tensor(input_tensor, batch_size)[source]
autorag.nodes.passagereranker.colbert.slice_tokenizer_result(tokenizer_output, batch_size)[source]

autorag.nodes.passagereranker.flag_embedding module

class autorag.nodes.passagereranker.flag_embedding.FlagEmbeddingReranker(project_dir, model_name: str = 'BAAI/bge-reranker-large', *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]
autorag.nodes.passagereranker.flag_embedding.flag_embedding_run_model(input_texts, model, batch_size: int)[source]

autorag.nodes.passagereranker.flag_embedding_llm module

class autorag.nodes.passagereranker.flag_embedding_llm.FlagEmbeddingLLMReranker(project_dir, model_name: str = 'BAAI/bge-reranker-v2-gemma', *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]

autorag.nodes.passagereranker.flashrank module

class autorag.nodes.passagereranker.flashrank.FlashRankReranker(project_dir: str, model: str = 'ms-marco-TinyBERT-L-2-v2', *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]
autorag.nodes.passagereranker.flashrank.flashrank_run_model(input_texts, tokenizer, session, batch_size: int)[source]

autorag.nodes.passagereranker.jina module

class autorag.nodes.passagereranker.jina.JinaReranker(project_dir: str, api_key: str | None = None, *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]
async autorag.nodes.passagereranker.jina.jina_reranker_pure(session, query: str, contents: List[str], ids: List[str], top_k: int, model: str = 'jina-reranker-v1-base-en') Tuple[List[str], List[str], List[float]][source]

autorag.nodes.passagereranker.koreranker module

class autorag.nodes.passagereranker.koreranker.KoReranker(project_dir: str, *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]
autorag.nodes.passagereranker.koreranker.exp_normalize(x)[source]
autorag.nodes.passagereranker.koreranker.koreranker_run_model(input_texts, model, tokenizer, device, batch_size: int)[source]

autorag.nodes.passagereranker.mixedbreadai module

class autorag.nodes.passagereranker.mixedbreadai.MixedbreadAIReranker(project_dir: str, *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]
async autorag.nodes.passagereranker.mixedbreadai.mixedbreadai_rerank_pure(client: AsyncMixedbreadAI, query: str, documents: List[str], ids: List[str], top_k: int, model: str = 'mixedbread-ai/mxbai-rerank-large-v1') Tuple[List[str], List[str], List[float]][source]

Rerank a list of contents with mixedbread-ai rerank models.

Parameters:
  • client – The mixedbread-ai client to use for reranking

  • query – The query to use for reranking

  • documents – The list of contents to rerank

  • ids – The list of ids corresponding to the documents

  • top_k – The number of passages to be retrieved

  • model – The model name for mixedbread-ai rerank. You can choose between “mixedbread-ai/mxbai-rerank-large-v1” and “mixedbread-ai/mxbai-rerank-base-v1”. Default is “mixedbread-ai/mxbai-rerank-large-v1”.

Returns:

Tuple of lists containing the reranked contents, ids, and scores

autorag.nodes.passagereranker.monot5 module

class autorag.nodes.passagereranker.monot5.MonoT5(project_dir: str, model_name: str = 'castorini/monot5-3b-msmarco-10k', *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]
autorag.nodes.passagereranker.monot5.monot5_run_model(input_texts, model, batch_size: int, tokenizer, device, token_false_id, token_true_id)[source]

autorag.nodes.passagereranker.openvino module

class autorag.nodes.passagereranker.openvino.OpenVINOReranker(project_dir: str, model: str = 'BAAI/bge-reranker-large', *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]
autorag.nodes.passagereranker.openvino.openvino_run_model(input_texts, model, batch_size: int, tokenizer)[source]

autorag.nodes.passagereranker.pass_reranker module

class autorag.nodes.passagereranker.pass_reranker.PassReranker(project_dir: str | Path, *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]

autorag.nodes.passagereranker.rankgpt module

class autorag.nodes.passagereranker.rankgpt.AsyncRankGPTRerank(top_n: int = 5, llm: LLM | None = None, verbose: bool = False, rankgpt_rerank_prompt: BasePromptTemplate | None = None)[source]

Bases: RankGPTRerank

async async_postprocess_nodes(nodes: List[NodeWithScore], query_bundle: QueryBundle, ids: List[str] | None = None) Tuple[List[NodeWithScore], List[str]][source]
async async_run_llm(messages: Sequence[ChatMessage]) ChatResponse[source]
llm: LLM
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'callback_manager': FieldInfo(annotation=CallbackManager, required=False, default_factory=CallbackManager, exclude=True), 'llm': FieldInfo(annotation=LLM, required=False, default_factory=get_default_llm, description='LLM to use for rankGPT'), 'rankgpt_rerank_prompt': FieldInfo(annotation=BasePromptTemplate, required=True, description='rankGPT rerank prompt.', metadata=[SerializeAsAny()]), 'top_n': FieldInfo(annotation=int, required=False, default=5, description='Top N nodes to return from reranking.'), 'verbose': FieldInfo(annotation=bool, required=False, default=False, description='Whether to print intermediate steps.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

rankgpt_rerank_prompt: Annotated[BasePromptTemplate, SerializeAsAny()]
top_n: int
verbose: bool
class autorag.nodes.passagereranker.rankgpt.RankGPT(project_dir: str, llm: str | LLM | None = None, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]

autorag.nodes.passagereranker.run module

autorag.nodes.passagereranker.run.run_passage_reranker_node(modules: List, module_params: List[Dict], previous_result: DataFrame, node_line_dir: str, strategies: Dict) DataFrame[source]

Run evaluation and select the best module among passage reranker node results.

Parameters:
  • modules – Passage reranker modules to run.

  • module_params – Passage reranker module parameters.

  • previous_result – Previous result dataframe. Could be retrieval, reranker modules result. It means it must contain ‘query’, ‘retrieved_contents’, ‘retrieved_ids’, ‘retrieve_scores’ columns.

  • node_line_dir – This node line’s directory.

  • strategies – Strategies for passage reranker node. In this node, we use ‘retrieval_f1’, ‘retrieval_recall’ and ‘retrieval_precision’. You can skip evaluation when you use only one module and a module parameter.

Returns:

The best result dataframe with previous result columns.

autorag.nodes.passagereranker.sentence_transformer module

class autorag.nodes.passagereranker.sentence_transformer.SentenceTransformerReranker(project_dir: str, model_name: str = 'cross-encoder/ms-marco-MiniLM-L-2-v2', *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]

Rerank a list of contents based on their relevance to a query using a Sentence Transformer model.

Parameters:
  • previous_result – The previous result

  • top_k – The number of passages to be retrieved

  • batch – The number of queries to be processed in a batch

Returns:

pd DataFrame containing the reranked contents, ids, and scores

autorag.nodes.passagereranker.sentence_transformer.sentence_transformer_run_model(input_texts, model, batch_size: int)[source]

autorag.nodes.passagereranker.time_reranker module

class autorag.nodes.passagereranker.time_reranker.TimeReranker(project_dir: str, *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]

autorag.nodes.passagereranker.upr module

class autorag.nodes.passagereranker.upr.UPRScorer(suffix_prompt: str, prefix_prompt: str, use_bf16: bool = False)[source]

Bases: object

compute(query: str, contents: List[str]) List[float][source]
class autorag.nodes.passagereranker.upr.Upr(project_dir: str, use_bf16: bool = False, prefix_prompt: str = 'Passage: ', suffix_prompt: str = 'Please write a question based on this passage.', *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]

autorag.nodes.passagereranker.voyageai module

class autorag.nodes.passagereranker.voyageai.VoyageAIReranker(project_dir: str, *args, **kwargs)[source]

Bases: BasePassageReranker

pure(previous_result: DataFrame, *args, **kwargs)[source]
async autorag.nodes.passagereranker.voyageai.voyageai_rerank_pure(voyage_client: AsyncClient, model: str, query: str, documents: List[str], ids: List[str], top_k: int, truncation: bool = True) Tuple[List[str], List[str], List[float]][source]

Rerank a list of contents with VoyageAI rerank models.

Parameters:
  • voyage_client – The Voyage Client to use for reranking

  • model – The model name for VoyageAI rerank

  • query – The query to use for reranking

  • documents – The list of contents to rerank

  • ids – The list of ids corresponding to the documents

  • top_k – The number of passages to be retrieved

  • truncation – Whether to truncate the input to satisfy the ‘context length limit’ on the query and the documents.

Returns:

Tuple of lists containing the reranked contents, ids, and scores

Module contents