open-rag-eval and RAG-evaluation-harnesses
These are complements: open-rag-eval provides a framework for evaluating RAG systems without requiring reference answers, while RAG-evaluation-harnesses offers a comprehensive evaluation suite that could incorporate or be used alongside different evaluation methodologies, allowing practitioners to combine multiple evaluation approaches for more robust assessment.
About open-rag-eval
vectara/open-rag-eval
RAG evaluation without the need for "golden answers"
This tool helps RAG (Retrieval Augmented Generation) system builders and integrators assess and improve the quality of their AI-powered question-answering systems. You provide a set of questions (queries) and receive detailed performance scores and diagnostic reports, identifying how well your RAG system retrieves relevant information and generates accurate answers. This is for anyone building or maintaining a RAG system, such as AI product managers, machine learning engineers, or solution architects.
About RAG-evaluation-harnesses
RulinShao/RAG-evaluation-harnesses
An evaluation suite for Retrieval-Augmented Generation (RAG).
This project helps evaluate how well your Retrieval-Augmented Generation (RAG) system performs on various question-answering tasks. You provide your RAG model's retrieved documents and the questions, and it outputs performance scores. This tool is for researchers, developers, or MLOps engineers who are building and fine-tuning RAG systems and need to rigorously benchmark their effectiveness.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work