aws-samples/genai-system-evaluation

A set of examples demonstrating how to evaluate Generative AI augmented systems using traditional information retrieval and LLM-As-A-Judge validation techniques

/ 100

Emerging

This project helps evaluate how well your Generative AI applications, especially those using Retrieval-Augmented Generation (RAG), are performing. It takes in your AI model outputs and validation datasets, then provides scores and insights into the quality of responses. This is for AI developers, machine learning engineers, and data scientists who build and refine AI systems.

Use this if you are building an AI application and need to systematically test and improve its accuracy, relevance, and overall effectiveness before deployment.

Not ideal if you are looking for a plug-and-play solution for end-user AI evaluation without needing to delve into code or specific model configurations.

AI-development LLM-evaluation RAG-testing Generative-AI-quality machine-learning-engineering

No Package No Dependents

Maintenance 6 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT-0

Higher-rated alternatives

GoogleCloudPlatform/vertex-ai-samples

Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop...

neo4j-partners/hands-on-lab-neo4j-and-google

Hands on Lab for Neo4j and Google

lynnlangit/learning-cloud

Courses, sample code, articles & screencasts - AWS, Azure, & GCP

GoogleCloudPlatform/applied-ai-engineering-samples

This repository compiles code samples and notebooks demonstrating how to use Generative AI on...

streamlit/30DaysOfAI

30 Days of AI

Explore Generative AI Tools

All categories Trending Generative AI directory Insights