OpenDCAI/One-Eval

Automated system for LLM evaluation via agents.

/ 100

Emerging

One-Eval helps AI product managers and researchers automatically assess the quality and performance of large language models (LLMs). You input your evaluation goals in natural language, and it outputs comprehensive reports detailing how well the LLM performs on various tasks like reasoning or general knowledge. This tool is for anyone responsible for developing, deploying, or overseeing LLM-powered applications.

Use this if you need to quickly and automatically evaluate your LLMs using natural language prompts, eliminating the need for manual script writing and benchmark configuration.

Not ideal if your evaluation requires complex sandbox environments or involves non-text-based LLM capabilities like code execution or Text2SQL, which are not yet fully supported.

LLM evaluation AI model testing natural language processing model quality assurance AI research

No Package No Dependents

Maintenance 13 / 25

Adoption 6 / 25

Maturity 13 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights