Re-Align/just-eval
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
This tool helps developers and researchers evaluate the quality of responses generated by large language models (LLMs). You input a set of prompts and the LLM's generated answers, and it produces a detailed, multi-faceted assessment of those answers. It's designed for anyone who needs to benchmark and compare different LLMs based on criteria like helpfulness, clarity, factuality, depth, engagement, and safety.
No commits in the last 6 months.
Use this if you are developing or fine-tuning LLMs and need a structured, interpretable way to measure their performance across various quality aspects.
Not ideal if you need a simple pass/fail evaluation or are looking for a tool to evaluate human-written text rather than LLM outputs.
Stars
90
Forks
7
Language
Python
License
MIT
Category
Last pushed
Jan 29, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Re-Align/just-eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents