141forever/DiaHalu

This is the repository for the paper 'DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models' (EMNLP2024 findings)

/ 100

Experimental

This project provides a unique dataset to help evaluate if large language models (LLMs) are generating inaccurate or misleading information in ongoing conversations. It offers examples of dialogues, along with labels indicating whether a 'hallucination' (a factual error or incoherent statement) occurred and detailed explanations. AI researchers, NLP practitioners, and product managers working with conversational AI will find this useful for testing and improving their models.

No commits in the last 6 months.

Use this if you need a specialized dataset to benchmark and improve the accuracy and truthfulness of your large language model during dialogue-based interactions.

Not ideal if you are looking for a dataset to evaluate single-turn prompt responses rather than full conversational exchanges.

conversational-AI LLM-evaluation natural-language-processing AI-safety dialogue-systems

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

—

License

—

Higher-rated alternatives

vectara/hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

PKU-YuanGroup/Hallucination-Attack

Attack to induce LLMs within hallucinations

amir-hameed-mir/Sirraya_LSD_Code

Layer-wise Semantic Dynamics (LSD) is a model-agnostic framework for hallucination detection in...

NishilBalar/Awesome-LVLM-Hallucination

up-to-date curated list of state-of-the-art Large vision language models hallucinations...

intuit/sac3

Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via...

Explore LLM Tools

All categories Trending LLM Tool directory Insights