open-rag-eval and RAG-evaluation-harnesses

These are complements: open-rag-eval provides a framework for evaluating RAG systems without requiring reference answers, while RAG-evaluation-harnesses offers a comprehensive evaluation suite that could incorporate or be used alongside different evaluation methodologies, allowing practitioners to combine multiple evaluation approaches for more robust assessment.

open-rag-eval
53
Established
Maintenance 6/25
Adoption 10/25
Maturity 25/25
Community 12/25
Maintenance 2/25
Adoption 6/25
Maturity 16/25
Community 11/25
Stars: 347
Forks: 21
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 23
Forks: 3
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No risk flags
Stale 6m No Package No Dependents

About open-rag-eval

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

This tool helps RAG (Retrieval Augmented Generation) system builders and integrators assess and improve the quality of their AI-powered question-answering systems. You provide a set of questions (queries) and receive detailed performance scores and diagnostic reports, identifying how well your RAG system retrieves relevant information and generates accurate answers. This is for anyone building or maintaining a RAG system, such as AI product managers, machine learning engineers, or solution architects.

AI-powered search Generative AI evaluation RAG system optimization Customer support automation Knowledge base accuracy

About RAG-evaluation-harnesses

RulinShao/RAG-evaluation-harnesses

An evaluation suite for Retrieval-Augmented Generation (RAG).

This project helps evaluate how well your Retrieval-Augmented Generation (RAG) system performs on various question-answering tasks. You provide your RAG model's retrieved documents and the questions, and it outputs performance scores. This tool is for researchers, developers, or MLOps engineers who are building and fine-tuning RAG systems and need to rigorously benchmark their effectiveness.

RAG-evaluation LLM-benchmarking NLP-research AI-model-testing information-retrieval

Scores updated daily from GitHub, PyPI, and npm data. How scores work