Vvkmnn/awesome-ai-eval

☑️ A curated list of tools, methods & platforms for evaluating AI reliability in real applications.

42
/ 100
Emerging

This is a curated list of tools, methods, and platforms designed to help you verify that your AI models, like large language models or autonomous agents, are working reliably and not producing undesirable outputs such as 'hallucinations'. It helps you input your AI's behavior and desired outcomes to assess its performance against various benchmarks. This resource is for AI practitioners, machine learning engineers, and product managers responsible for developing, deploying, and maintaining AI systems in real-world applications.

Use this if you need to thoroughly test, debug, and monitor the quality and reliability of your AI models and applications in production.

Not ideal if you are looking for general machine learning development resources that are not specifically focused on AI evaluation.

AI evaluation LLM testing AI reliability production AI monitoring RAG pipeline testing
No Package No Dependents
Maintenance 10 / 25
Adoption 8 / 25
Maturity 13 / 25
Community 11 / 25

How are scores calculated?

Stars

66

Forks

7

Language

License

CC0-1.0

Last pushed

Feb 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Vvkmnn/awesome-ai-eval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.