langwatch/langevals

LangEvals aggregates various language model evaluators into a single platform, providing a standard interface for a multitude of scores and LLM guardrails, for you to protect and benchmark your LLM models and pipelines.

/ 100

Emerging

This tool helps you evaluate and protect your language model applications by bringing together various assessment methods into one place. It takes your language model outputs and gives you a range of scores and safety checks. This is designed for anyone building or managing applications powered by large language models, such as product managers, AI safety engineers, or MLOps specialists.

Use this if you need a standardized way to measure the performance and safety of your language models and ensure they don't produce undesirable content.

Not ideal if you are looking for a tool to train or fine-tune language models, as this focuses solely on evaluation and guardrails.

LLM-evaluation AI-safety NLP-benchmarking model-guardrails AI-application-monitoring

No License No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

—

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights