izam-mohammed/ragrank

🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.

/ 100

Established

This toolkit helps you assess the performance of your Retrieval-Augmented Generation (RAG) applications. You provide your RAG model's questions, the contexts it retrieves, and its generated responses, and it gives you metrics on factual accuracy, context understanding, and tone. This is for AI/ML engineers, data scientists, or product managers who build and deploy LLM applications and need to ensure their RAG systems are delivering high-quality, reliable outputs.

Use this if you are developing RAG-based LLM applications and need to systematically measure and improve their factual accuracy, contextual understanding, and overall response quality.

Not ideal if you are looking to evaluate foundational LLMs directly, rather than the end-to-end performance of a RAG system.

LLM application development RAG system evaluation AI model quality assurance Natural Language Processing Generative AI

No Package No Dependents

Maintenance 10 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Compare

ragrank and evalscope ragrank and llm-eval ragrank and llm-evaluation ragrank and llm-eval-bench ragrank and RagaliQ

Related tools

modelscope/evalscope

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation...

Kareem-Rashed/rubric-eval

Independent framework to test, benchmark, and evaluate LLMs & AI agents locally.

justplus/llm-eval

大语言模型评估平台，支持多种评估基准、自定义数据集和性能测试。支持基于自定义数据集的RAG评估。

relari-ai/continuous-eval

Data-Driven Evaluation for LLM-Powered Applications

cleanlab/tlm

Score the trustworthiness of outputs from any LLM in real-time

Explore RAG Tools

All categories Trending RAG directory Insights