TIGER-AI-Lab/TIGERScore

"TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks" [TMLR 2024]

/ 100

Emerging

TIGERScore helps you quickly and thoroughly evaluate AI-generated text. It takes your instructions, the original context, and the AI's output, then provides a detailed breakdown of errors with explanations, locations, and penalty scores. This is ideal for anyone who needs to assess the quality of AI-generated content across various writing tasks without needing a perfect example to compare against.

No commits in the last 6 months.

Use this if you need a flexible, instruction-driven tool to automatically identify and explain mistakes in AI-generated text, especially when you don't have a perfect reference answer.

Not ideal if you prefer a simple pass/fail score without detailed error explanations or if your text generation quality evaluation relies purely on traditional, reference-based metrics.

content-creation AI-writing-evaluation natural-language-generation quality-assurance text-analysis

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

eth-sri/matharena

Evaluation of LLMs on latest math competitions

tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality,...

HPAI-BSC/TuRTLe

TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)

nlp-uoregon/mlmm-evaluation

Multilingual Large Language Models Evaluation Benchmark

haesleinhuepf/human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation

Explore Transformer Models

All categories Trending Transformer directory Insights