beir-cellar/beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
BEIR helps developers and researchers working with search engines and recommender systems compare the effectiveness of different information retrieval models. It takes various textual datasets and your trained retrieval model, then outputs standardized performance metrics like NDCG, MAP, and Recall, allowing you to understand how well your model retrieves relevant information.
2,105 stars. Used by 9 other packages. Available on PyPI.
Use this if you need to rigorously evaluate and compare different information retrieval models across a wide range of tasks and datasets.
Not ideal if you are looking for a pre-built search engine solution or don't intend to develop and test your own retrieval models.
Stars
2,105
Forks
235
Language
Python
License
Apache-2.0
Category
Last pushed
Oct 16, 2025
Commits (30d)
0
Dependencies
3
Reverse dependents
9
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/beir-cellar/beir"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
HKUDS/LightRAG
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
HKUDS/RAG-Anything
"RAG-Anything: All-in-One RAG Framework"
superlinear-ai/raglite
🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL
illuin-tech/vidore-benchmark
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
DataScienceUIBK/Rankify
🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented...