fresh-stack/freshstack

This repository helps you evaluate your models on the FreshStack benchmark!

/ 100

Emerging

This tool helps AI engineers and researchers build and evaluate benchmarks for information retrieval (IR) and retrieval-augmented generation (RAG) systems. It automatically gathers realistic, niche technical content from sources like Stack Overflow and GitHub repositories, then provides a framework to test how well different models find relevant information. You input a model's retrieval results or an embedding model, and it outputs evaluation metrics like Alpha-nDCG, coverage, and recall.

Available on PyPI.

Use this if you need to create and assess the performance of your IR/RAG models on up-to-date, community-sourced technical documentation and user-asked questions.

Not ideal if you are looking for a general-purpose model evaluation tool for domains outside of technical information retrieval or if you don't need to generate custom benchmarks from live data.

AI model evaluation information retrieval RAG systems technical documentation AI benchmarking

Maintenance 6 / 25

Adoption 7 / 25

Maturity 24 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

Embeddings Are Easier Than Whatever You're Doing Instead You're Shipping AI You Can't Measure

Higher-rated alternatives

embeddings-benchmark/mteb

MTEB: Massive Text Embedding Benchmark

harmonydata/harmony

The Harmony Python library: a research tool for psychologists to harmonise data and...

yannvgn/laserembeddings

LASER multilingual sentence embeddings as a pip package

embeddings-benchmark/results

Data for the MTEB leaderboard

Hironsan/awesome-embedding-models

A curated list of awesome embedding models tutorials, projects and communities.

Explore Embedding Tools

All categories Trending Embeddings directory Insights