fresh-stack/freshstack
This repository helps you evaluate your models on the FreshStack benchmark!
This tool helps AI engineers and researchers build and evaluate benchmarks for information retrieval (IR) and retrieval-augmented generation (RAG) systems. It automatically gathers realistic, niche technical content from sources like Stack Overflow and GitHub repositories, then provides a framework to test how well different models find relevant information. You input a model's retrieval results or an embedding model, and it outputs evaluation metrics like Alpha-nDCG, coverage, and recall.
Available on PyPI.
Use this if you need to create and assess the performance of your IR/RAG models on up-to-date, community-sourced technical documentation and user-asked questions.
Not ideal if you are looking for a general-purpose model evaluation tool for domains outside of technical information retrieval or if you don't need to generate custom benchmarks from live data.
Stars
33
Forks
3
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 09, 2025
Commits (30d)
0
Dependencies
3
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/fresh-stack/freshstack"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
embeddings-benchmark/mteb
MTEB: Massive Text Embedding Benchmark
harmonydata/harmony
The Harmony Python library: a research tool for psychologists to harmonise data and...
yannvgn/laserembeddings
LASER multilingual sentence embeddings as a pip package
embeddings-benchmark/results
Data for the MTEB leaderboard
Hironsan/awesome-embedding-models
A curated list of awesome embedding models tutorials, projects and communities.