OpenMOSS/HalluQA

Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"

/ 100

Emerging

This project helps evaluate how often Chinese Large Language Models (LLMs) generate incorrect or made-up information, a problem known as hallucination. It provides a benchmark dataset of carefully designed Chinese questions, along with scripts to assess your model's answers. The output is a "non-hallucination rate" or accuracy score, indicating your model's reliability. This is for researchers, product managers, or anyone working with Chinese LLMs who needs to quantify and improve their models' factual accuracy.

136 stars. No commits in the last 6 months.

Use this if you need to measure and compare the hallucination rates of various Chinese Large Language Models, especially for tasks involving knowledge or sensitive information.

Not ideal if you are working with non-Chinese LLMs or if your primary concern is not model hallucination.

Large Language Models NLP Evaluation AI Trustworthiness Chinese AI Hallucination Detection

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

136

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

vectara/hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

PKU-YuanGroup/Hallucination-Attack

Attack to induce LLMs within hallucinations

amir-hameed-mir/Sirraya_LSD_Code

Layer-wise Semantic Dynamics (LSD) is a model-agnostic framework for hallucination detection in...

NishilBalar/Awesome-LVLM-Hallucination

up-to-date curated list of state-of-the-art Large vision language models hallucinations...

intuit/sac3

Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via...

Explore LLM Tools

All categories Trending LLM Tool directory Insights