IAAR-Shanghai/CRUD_RAG

CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

/ 100

Emerging

This project helps developers and researchers evaluate the performance of Retrieval-Augmented Generation (RAG) systems with Chinese language data. You provide your RAG system, a large set of Chinese news documents, and the project outputs various metrics showing how well your RAG system retrieves relevant information and generates accurate, coherent responses. It's designed for AI/ML engineers, natural language processing researchers, and RAG system developers.

362 stars. No commits in the last 6 months.

Use this if you are building or researching RAG systems and need a robust benchmark to assess their capabilities, especially with Chinese text.

Not ideal if you are an end-user looking for a ready-to-use RAG application or if you are not comfortable with modifying code and setting up language model APIs.

RAG evaluation Chinese NLP LLM benchmarking Information Retrieval Generative AI development

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 14 / 25

How are scores calculated?

Stars

362

Forks

Language

Python

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

DocAILab/XRAG

XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...

HZYAI/RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

microsoft/benchmark-qed

Automated benchmarking of Retrieval-Augmented Generation (RAG) systems

Explore RAG Tools

All categories Trending RAG directory Insights