RulinShao/retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
This project helps machine learning engineers and researchers build and evaluate retrieval-augmented language models. It takes large text datasets and a retriever model, processing them into massive, efficient datastores. The output includes performance metrics like perplexity and downstream task scores, enabling users to understand how their models perform with different data scales and configurations.
224 stars.
Use this if you are a machine learning engineer or researcher working with retrieval-augmented language models and need tools to efficiently scale datastores, evaluate model performance, or serve pre-built datastores.
Not ideal if you are an end-user looking for a pre-packaged application or a non-technical person without experience in machine learning model development.
Stars
224
Forks
18
Language
Python
License
MIT
Category
Last pushed
Dec 16, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/RulinShao/retrieval-scaling"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
denser-org/denser-retriever
An enterprise-grade AI retriever designed to streamline AI integration into your applications,...
rayliuca/T-Ragx
Enhancing Translation with RAG-Powered Large Language Models
neuml/rag
🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with...
NovaSearch-Team/RAG-Retrieval
Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.
MozerWang/Loong
[EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA