TIGER-AI-Lab/TIGERScore
"TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks" [TMLR 2024]
TIGERScore helps you quickly and thoroughly evaluate AI-generated text. It takes your instructions, the original context, and the AI's output, then provides a detailed breakdown of errors with explanations, locations, and penalty scores. This is ideal for anyone who needs to assess the quality of AI-generated content across various writing tasks without needing a perfect example to compare against.
No commits in the last 6 months.
Use this if you need a flexible, instruction-driven tool to automatically identify and explain mistakes in AI-generated text, especially when you don't have a perfect reference answer.
Not ideal if you prefer a simple pass/fail score without detailed error explanations or if your text generation quality evaluation relies purely on traditional, reference-based metrics.
Stars
32
Forks
3
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Dec 21, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/TIGER-AI-Lab/TIGERScore"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
eth-sri/matharena
Evaluation of LLMs on latest math competitions
tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality,...
HPAI-BSC/TuRTLe
TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)
nlp-uoregon/mlmm-evaluation
Multilingual Large Language Models Evaluation Benchmark
haesleinhuepf/human-eval-bia
Benchmarking Large Language Models for Bio-Image Analysis Code Generation