Re-Align/just-eval

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

35
/ 100
Emerging

This tool helps developers and researchers evaluate the quality of responses generated by large language models (LLMs). You input a set of prompts and the LLM's generated answers, and it produces a detailed, multi-faceted assessment of those answers. It's designed for anyone who needs to benchmark and compare different LLMs based on criteria like helpfulness, clarity, factuality, depth, engagement, and safety.

No commits in the last 6 months.

Use this if you are developing or fine-tuning LLMs and need a structured, interpretable way to measure their performance across various quality aspects.

Not ideal if you need a simple pass/fail evaluation or are looking for a tool to evaluate human-written text rather than LLM outputs.

LLM evaluation natural language processing AI model benchmarking generative AI development conversational AI
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 10 / 25

How are scores calculated?

Stars

90

Forks

7

Language

Python

License

MIT

Last pushed

Jan 29, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Re-Align/just-eval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.