open-rag-eval and rag-evaluator

These are competitors offering different evaluation methodologies: Vectara's framework enables reference-free evaluation using LLMs to assess RAG quality directly, while AIAnytime's library implements traditional evaluation requiring ground-truth golden answers for comparison.

open-rag-eval
53
Established
rag-evaluator
52
Established
Maintenance 6/25
Adoption 10/25
Maturity 25/25
Community 12/25
Maintenance 0/25
Adoption 8/25
Maturity 25/25
Community 19/25
Stars: 347
Forks: 21
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 42
Forks: 18
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No risk flags
Stale 6m

About open-rag-eval

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

This tool helps RAG (Retrieval Augmented Generation) system builders and integrators assess and improve the quality of their AI-powered question-answering systems. You provide a set of questions (queries) and receive detailed performance scores and diagnostic reports, identifying how well your RAG system retrieves relevant information and generates accurate answers. This is for anyone building or maintaining a RAG system, such as AI product managers, machine learning engineers, or solution architects.

AI-powered search Generative AI evaluation RAG system optimization Customer support automation Knowledge base accuracy

About rag-evaluator

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

This tool helps you check the quality of answers generated by AI systems, especially those that combine information retrieval with text generation (RAG systems). You provide an AI's answer, the original question, and a perfect reference answer, and it tells you how good the AI's answer is. This is ideal for AI developers, researchers, and anyone building or testing conversational AI applications.

AI-development NLP-evaluation conversational-AI-testing content-generation-quality

Scores updated daily from GitHub, PyPI, and npm data. How scores work