FastEval/FastEval

Fast & more realistic evaluation of chat language models. Includes leaderboard.

/ 100

Emerging

This tool helps AI model developers and researchers assess the performance of their chat and instruction-following language models. You input a language model and it outputs detailed performance scores across various benchmarks, like conversational ability, coding proficiency, and reasoning. This helps you understand how well your model handles different tasks.

189 stars. No commits in the last 6 months.

Use this if you are developing or fine-tuning large language models and need to rigorously benchmark their capabilities against established metrics.

Not ideal if you are looking for a platform to build or deploy LLM-powered applications, as its primary focus is on evaluation.

LLM-evaluation AI-model-benchmarking natural-language-processing conversational-AI code-generation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

189

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights