fresh-stack/freshstack

This repository helps you evaluate your models on the FreshStack benchmark!

46
/ 100
Emerging

This tool helps AI engineers and researchers build and evaluate benchmarks for information retrieval (IR) and retrieval-augmented generation (RAG) systems. It automatically gathers realistic, niche technical content from sources like Stack Overflow and GitHub repositories, then provides a framework to test how well different models find relevant information. You input a model's retrieval results or an embedding model, and it outputs evaluation metrics like Alpha-nDCG, coverage, and recall.

Available on PyPI.

Use this if you need to create and assess the performance of your IR/RAG models on up-to-date, community-sourced technical documentation and user-asked questions.

Not ideal if you are looking for a general-purpose model evaluation tool for domains outside of technical information retrieval or if you don't need to generate custom benchmarks from live data.

AI model evaluation information retrieval RAG systems technical documentation AI benchmarking
Maintenance 6 / 25
Adoption 7 / 25
Maturity 24 / 25
Community 9 / 25

How are scores calculated?

Stars

33

Forks

3

Language

Python

License

Apache-2.0

Last pushed

Dec 09, 2025

Commits (30d)

0

Dependencies

3

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/fresh-stack/freshstack"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.