AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

/ 100

Established

This tool helps you check the quality of answers generated by AI systems, especially those that combine information retrieval with text generation (RAG systems). You provide an AI's answer, the original question, and a perfect reference answer, and it tells you how good the AI's answer is. This is ideal for AI developers, researchers, and anyone building or testing conversational AI applications.

No commits in the last 6 months. Available on PyPI.

Use this if you are developing or managing AI systems that generate text and need to quantitatively assess the accuracy, coherence, and fairness of their outputs against known good answers.

Not ideal if you're looking for a tool to generate text, fix grammar, or analyze human-written content for sentiment, as it specifically evaluates AI-generated responses.

AI-development NLP-evaluation conversational-AI-testing content-generation-quality

Stale 6m

Maintenance 0 / 25

Adoption 8 / 25

Maturity 25 / 25

Community 19 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Compare

rag-evaluator and open-rag-eval rag-evaluator and nuclia-eval rag-evaluator and rageval rag-evaluator and RAG-evaluation-harnesses rag-evaluator and RAG-Evaluator

Related tools

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

DocAILab/XRAG

XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...

HZYAI/RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...

microsoft/benchmark-qed

Automated benchmarking of Retrieval-Augmented Generation (RAG) systems

2501Pr0ject/RAGnarok-AI

Local-first RAG evaluation framework for LLM applications. 100% local, no API keys required.

Explore RAG Tools

All categories Trending RAG directory Insights