greynewell/evaldriven.org

Ship evals before you ship features.

42
/ 100
Emerging

This framework helps AI product managers and machine learning engineers define, measure, and enforce the correctness of AI systems. It guides you to specify 'working' through automated evaluations before writing any AI code, ensuring that every AI feature ships with statistical proof of its performance. The outcome is robust, verifiable AI applications that meet predefined quality and cost criteria.

Use this if you are developing AI-powered products and need a rigorous, automated way to ensure their quality, reliability, and cost-effectiveness from conception through continuous integration.

Not ideal if you are working on traditional software development or do not require statistical proof and continuous evaluation for probabilistic AI outputs.

AI Product Management Machine Learning Engineering AI Quality Assurance Continuous Integration for AI AI System Design
No Package No Dependents
Maintenance 10 / 25
Adoption 6 / 25
Maturity 11 / 25
Community 15 / 25

How are scores calculated?

Stars

18

Forks

5

Language

Nunjucks

License

CC0-1.0

Last pushed

Feb 25, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/greynewell/evaldriven.org"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.