IAAR-Shanghai/CRUD_RAG
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
This project helps developers and researchers evaluate the performance of Retrieval-Augmented Generation (RAG) systems with Chinese language data. You provide your RAG system, a large set of Chinese news documents, and the project outputs various metrics showing how well your RAG system retrieves relevant information and generates accurate, coherent responses. It's designed for AI/ML engineers, natural language processing researchers, and RAG system developers.
362 stars. No commits in the last 6 months.
Use this if you are building or researching RAG systems and need a robust benchmark to assess their capabilities, especially with Chinese text.
Not ideal if you are an end-user looking for a ready-to-use RAG application or if you are not comfortable with modifying code and setting up language model APIs.
Stars
362
Forks
28
Language
Python
License
—
Category
Last pushed
May 20, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/IAAR-Shanghai/CRUD_RAG"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
vectara/open-rag-eval
RAG evaluation without the need for "golden answers"
DocAILab/XRAG
XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...
HZYAI/RagScore
⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...
AIAnytime/rag-evaluator
A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).
microsoft/benchmark-qed
Automated benchmarking of Retrieval-Augmented Generation (RAG) systems