open-rag-eval and rageval

These are competitors: open-rag-eval provides reference-free evaluation metrics suitable for production RAG systems, while rageval appears to be a lighter-weight evaluation toolkit, making them alternative choices for the same use case rather than tools designed to work together.

open-rag-eval
53
Established
rageval
36
Emerging
Maintenance 6/25
Adoption 10/25
Maturity 25/25
Community 12/25
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 10/25
Stars: 347
Forks: 21
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 170
Forks: 10
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No risk flags
Stale 6m No Package No Dependents

About open-rag-eval

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

This tool helps RAG (Retrieval Augmented Generation) system builders and integrators assess and improve the quality of their AI-powered question-answering systems. You provide a set of questions (queries) and receive detailed performance scores and diagnostic reports, identifying how well your RAG system retrieves relevant information and generates accurate answers. This is for anyone building or maintaining a RAG system, such as AI product managers, machine learning engineers, or solution architects.

AI-powered search Generative AI evaluation RAG system optimization Customer support automation Knowledge base accuracy

About rageval

gomate-community/rageval

Evaluation tools for Retrieval-augmented Generation (RAG) methods.

This tool helps evaluate the performance of your Retrieval-Augmented Generation (RAG) systems. It takes the outputs from various stages of your RAG pipeline—like rewritten queries, retrieved documents, and generated answers—and provides comprehensive scores on how well your system is performing across aspects like answer correctness, factual consistency, and document relevance. It is designed for AI/ML engineers or researchers building and refining RAG-based applications.

AI-evaluation NLP-benchmarking Generative-AI-testing LLM-performance Information-retrieval-quality

Scores updated daily from GitHub, PyPI, and npm data. How scores work