Re-Align/just-eval

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

/ 100

Emerging

This tool helps developers and researchers evaluate the quality of responses generated by large language models (LLMs). You input a set of prompts and the LLM's generated answers, and it produces a detailed, multi-faceted assessment of those answers. It's designed for anyone who needs to benchmark and compare different LLMs based on criteria like helpfulness, clarity, factuality, depth, engagement, and safety.

No commits in the last 6 months.

Use this if you are developing or fine-tuning LLMs and need a structured, interpretable way to measure their performance across various quality aspects.

Not ideal if you need a simple pass/fail evaluation or are looking for a tool to evaluate human-written text rather than LLM outputs.

LLM evaluation natural language processing AI model benchmarking generative AI development conversational AI

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights