OpenDCAI/One-Eval
Automated system for LLM evaluation via agents.
One-Eval helps AI product managers and researchers automatically assess the quality and performance of large language models (LLMs). You input your evaluation goals in natural language, and it outputs comprehensive reports detailing how well the LLM performs on various tasks like reasoning or general knowledge. This tool is for anyone responsible for developing, deploying, or overseeing LLM-powered applications.
Use this if you need to quickly and automatically evaluate your LLMs using natural language prompts, eliminating the need for manual script writing and benchmark configuration.
Not ideal if your evaluation requires complex sandbox environments or involves non-text-based LLM capabilities like code execution or Text2SQL, which are not yet fully supported.
Stars
24
Forks
2
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 19, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/OpenDCAI/One-Eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents