FlashRank Reranker

FlashRank is the Ultra-lite & Super-fast Python library to add re-ranking to your existing search & retrieval pipelines.

It is based on SoTA cross-encoders, with gratitude to all the model owners.

Module Parameters

  • batch : The size of a batch. If you have limited CUDA memory, decrease the size of the batch. (default: 64)

  • model : The type of model id or path you want to use for reranking. Default is id “”ms-marco-TinyBERT-L-2-v2””.

    • You can get the list of available models from FlashRank

    Note

    “rank_zephyr_7b_v1_full” is an llm based reranker that uses llama-cpp. Due to issues with parallel inference, “rank_zephyr_7b_v1_full” is not currently supported by AutoRAG.

Example config.yaml

- module_type: flashrank_reranker
  batch: 32
  model: "ms-marco-MiniLM-L-12-v2"