amazon-science/auto-rag-eval

Code repo for the ICML 2024 paper "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation"

/ 100

Emerging

This tool helps you assess how well your question-answering system performs by generating customized, multiple-choice exams directly from your specific knowledge documents. You provide your business's proprietary information, and it creates a relevant exam. The output is a rigorous evaluation of your AI's ability to answer questions based on that content, revealing where it excels or struggles. This is ideal for AI product managers, knowledge base owners, and content strategists who want to validate their AI's understanding of their unique domain.

No commits in the last 6 months.

Use this if you need to objectively measure the accuracy and relevance of your AI's responses to questions drawn from your own unique documentation or data.

Not ideal if you are looking for a general-purpose AI evaluation tool that doesn't focus on domain-specific question generation.

AI-evaluation knowledge-base-management content-validation AI-product-management customer-support-AI

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Related tools

ibm-self-serve-assets/JudgeIt-LLM-as-a-Judge

Automation Framework using LLM-as-a-judge to evaluate of Agentic AI, RAG, Text2SQL at scale;...

explore-de/rage4j

Evaluate your LLM based Java Apps

mit-ll-ai-technology/llm-sandbox

Large language model evaluation framework for logic and open-ended Q&A with a vareity of RAG and...

nl4opt/ORQA

[AAAI 2025] ORQA is a new QA benchmark designed to assess the reasoning capabilities of LLMs in...

Explore RAG Tools

All categories Trending RAG directory Insights