amazon-science/auto-rag-eval

Code repo for the ICML 2024 paper "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation"

41
/ 100
Emerging

This tool helps you assess how well your question-answering system performs by generating customized, multiple-choice exams directly from your specific knowledge documents. You provide your business's proprietary information, and it creates a relevant exam. The output is a rigorous evaluation of your AI's ability to answer questions based on that content, revealing where it excels or struggles. This is ideal for AI product managers, knowledge base owners, and content strategists who want to validate their AI's understanding of their unique domain.

No commits in the last 6 months.

Use this if you need to objectively measure the accuracy and relevance of your AI's responses to questions drawn from your own unique documentation or data.

Not ideal if you are looking for a general-purpose AI evaluation tool that doesn't focus on domain-specific question generation.

AI-evaluation knowledge-base-management content-validation AI-product-management customer-support-AI
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

86

Forks

13

Language

Python

License

Apache-2.0

Last pushed

Jun 13, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/amazon-science/auto-rag-eval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.