RulinShao/retrieval-scaling

Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".

45
/ 100
Emerging

This project helps machine learning engineers and researchers build and evaluate retrieval-augmented language models. It takes large text datasets and a retriever model, processing them into massive, efficient datastores. The output includes performance metrics like perplexity and downstream task scores, enabling users to understand how their models perform with different data scales and configurations.

224 stars.

Use this if you are a machine learning engineer or researcher working with retrieval-augmented language models and need tools to efficiently scale datastores, evaluate model performance, or serve pre-built datastores.

Not ideal if you are an end-user looking for a pre-packaged application or a non-technical person without experience in machine learning model development.

natural-language-processing large-language-models information-retrieval machine-learning-engineering model-evaluation
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

224

Forks

18

Language

Python

License

MIT

Last pushed

Dec 16, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/RulinShao/retrieval-scaling"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.