GustyCube/ERR-EVAL
Benchmark for evaluating AI epistemic reliability - testing how well LLMs handle uncertainty, avoid hallucinations, and acknowledge what they don't know.
This benchmark helps you evaluate how reliably your AI models handle uncertainty and incomplete information. It takes your AI model as input and outputs a score across five critical areas, showing how well it detects ambiguity, avoids making things up, and acknowledges what it doesn't know. AI product managers, researchers, and anyone deploying AI systems can use this to ensure their models are trustworthy and safe.
Use this if you need to rigorously test whether your AI model can recognize and respond appropriately to incomplete, noisy, or inconsistent data without 'hallucinating' or being overly confident.
Not ideal if you are looking to improve your AI's performance on standard factual recall or task execution where all necessary information is explicitly provided.
Stars
9
Forks
1
Language
Python
License
MIT
Category
Last pushed
Jan 02, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/GustyCube/ERR-EVAL"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Cloud-CV/EvalAI
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
fireindark707/Python-Schema-Matching
A python tool using XGboost and sentence-transformers to perform schema matching task on tables.
graphbookai/graphbook
Visual AI development framework for training and inference of ML models, scaling pipelines, and...
visual-layer/fastdup
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and...
github/CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.