nuclia/nuclia-eval

Library for evaluating RAG using Nuclia's models

/ 100

Emerging

This tool helps evaluate the performance of your RAG (Retrieval Augmented Generation) applications. You provide a question, the answer generated by your RAG system, and the source documents (context) it used. The tool then assesses how relevant the answer is to the question, how relevant each source document is to the question, and whether the answer is truly supported by the source documents. This is for developers and AI engineers building and refining RAG systems.

No commits in the last 6 months. Available on PyPI.

Use this if you are developing a RAG application and need to objectively measure the quality of its generated answers and the retrieved context.

Not ideal if you are a business user looking for a simple pass/fail judgment on a RAG system without getting into the technical evaluation metrics.

RAG evaluation LLM development AI quality assurance natural language processing information retrieval

Stale 6m

Maintenance 0 / 25

Adoption 6 / 25

Maturity 25 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Compare

nuclia-eval and rag-evaluator

Higher-rated alternatives

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

DocAILab/XRAG

XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...

HZYAI/RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

microsoft/benchmark-qed

Automated benchmarking of Retrieval-Augmented Generation (RAG) systems

Explore RAG Tools

All categories Trending RAG directory Insights