amazon-science/auto-rag-eval
Code repo for the ICML 2024 paper "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation"
This tool helps you assess how well your question-answering system performs by generating customized, multiple-choice exams directly from your specific knowledge documents. You provide your business's proprietary information, and it creates a relevant exam. The output is a rigorous evaluation of your AI's ability to answer questions based on that content, revealing where it excels or struggles. This is ideal for AI product managers, knowledge base owners, and content strategists who want to validate their AI's understanding of their unique domain.
No commits in the last 6 months.
Use this if you need to objectively measure the accuracy and relevance of your AI's responses to questions drawn from your own unique documentation or data.
Not ideal if you are looking for a general-purpose AI evaluation tool that doesn't focus on domain-specific question generation.
Stars
86
Forks
13
Language
Python
License
Apache-2.0
Category
Last pushed
Jun 13, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/amazon-science/auto-rag-eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
ibm-self-serve-assets/JudgeIt-LLM-as-a-Judge
Automation Framework using LLM-as-a-judge to evaluate of Agentic AI, RAG, Text2SQL at scale;...
explore-de/rage4j
Evaluate your LLM based Java Apps
mit-ll-ai-technology/llm-sandbox
Large language model evaluation framework for logic and open-ended Q&A with a vareity of RAG and...
nl4opt/ORQA
[AAAI 2025] ORQA is a new QA benchmark designed to assess the reasoning capabilities of LLMs in...