open-rag-eval and RAG-evaluation-harnesses

These are complements: open-rag-eval provides a framework for evaluating RAG systems without requiring reference answers, while RAG-evaluation-harnesses offers a comprehensive evaluation suite that could incorporate or be used alongside different evaluation methodologies, allowing practitioners to combine multiple evaluation approaches for more robust assessment.

open-rag-eval

Established

RAG-evaluation-harnesses

Emerging

Maintenance 6/25

Adoption 10/25

Maturity 25/25

Community 12/25

Maintenance 2/25

Adoption 6/25

Maturity 16/25

Community 11/25

Stars: 347

Forks: 21

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

Stars: 23

Forks: 3

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No risk flags

Stale 6m No Package No Dependents

About open-rag-eval

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

This tool helps RAG (Retrieval Augmented Generation) system builders and integrators assess and improve the quality of their AI-powered question-answering systems. You provide a set of questions (queries) and receive detailed performance scores and diagnostic reports, identifying how well your RAG system retrieves relevant information and generates accurate answers. This is for anyone building or maintaining a RAG system, such as AI product managers, machine learning engineers, or solution architects.

AI-powered search Generative AI evaluation RAG system optimization Customer support automation Knowledge base accuracy

About RAG-evaluation-harnesses

RulinShao/RAG-evaluation-harnesses

An evaluation suite for Retrieval-Augmented Generation (RAG).

This project helps evaluate how well your Retrieval-Augmented Generation (RAG) system performs on various question-answering tasks. You provide your RAG model's retrieved documents and the questions, and it outputs performance scores. This tool is for researchers, developers, or MLOps engineers who are building and fine-tuning RAG systems and need to rigorously benchmark their effectiveness.

RAG-evaluation LLM-benchmarking NLP-research AI-model-testing information-retrieval

Related comparisons

open-rag-eval and rag-evaluator open-rag-eval and rageval open-rag-eval and RAG-Evaluator open-rag-eval and XRAG open-rag-eval and rag-evaluator open-rag-eval and rageval

Scores updated daily from GitHub, PyPI, and npm data. How scores work