vectara/hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

55
/ 100
Established

When you're trying to pick the best Large Language Model (LLM) for summarizing documents, this leaderboard helps you evaluate their reliability. It shows how often different LLMs invent information (hallucinate) when summarizing texts, using Vectara's specialized evaluation model. This is useful for anyone who relies on LLMs for accurate content generation, such as content creators, researchers, or data analysts.

3,122 stars. Actively maintained with 3 commits in the last 30 days.

Use this if you need to choose an LLM for summarizing content and want to ensure it provides factually consistent and reliable outputs.

Not ideal if you are evaluating LLMs for tasks other than summarization, or if you need to perform your own hallucination evaluations without a pre-computed leaderboard.

LLM evaluation content summarization fact-checking AI reliability natural language processing
No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

3,122

Forks

96

Language

Python

License

Apache-2.0

Last pushed

Mar 10, 2026

Commits (30d)

3

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/vectara/hallucination-leaderboard"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.