greynewell/evaldriven.org
Ship evals before you ship features.
This framework helps AI product managers and machine learning engineers define, measure, and enforce the correctness of AI systems. It guides you to specify 'working' through automated evaluations before writing any AI code, ensuring that every AI feature ships with statistical proof of its performance. The outcome is robust, verifiable AI applications that meet predefined quality and cost criteria.
Use this if you are developing AI-powered products and need a rigorous, automated way to ensure their quality, reliability, and cost-effectiveness from conception through continuous integration.
Not ideal if you are working on traditional software development or do not require statistical proof and continuous evaluation for probabilistic AI outputs.
Stars
18
Forks
5
Language
Nunjucks
License
CC0-1.0
Category
Last pushed
Feb 25, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/greynewell/evaldriven.org"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
Cloud-CV/EvalAI
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
fireindark707/Python-Schema-Matching
A python tool using XGboost and sentence-transformers to perform schema matching task on tables.
graphbookai/graphbook
Visual AI development framework for training and inference of ML models, scaling pipelines, and...
visual-layer/fastdup
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and...
github/CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.