PKU-YuanGroup/Hallucination-Attack

Attack to induce LLMs within hallucinations

/ 100

Emerging

This project helps evaluate how easily large language models (LLMs) can be tricked into generating false information or 'hallucinations.' It takes a standard LLM and applies specially crafted, often nonsensical prompts to see if the model can be made to produce fake facts or news. This is useful for AI safety researchers, red teamers, and anyone responsible for assessing the reliability and potential risks of LLMs before deployment.

164 stars. No commits in the last 6 months.

Use this if you need to rigorously test an LLM's susceptibility to generating false or misleading content when given unusual or adversarial inputs.

Not ideal if you are looking to improve the factual accuracy of an LLM or fine-tune it for a specific task.

AI Safety LLM Evaluation Adversarial Testing AI Risk Assessment Model Red Teaming

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

164

Forks

Language

Python

License

MIT

Higher-rated alternatives

vectara/hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

amir-hameed-mir/Sirraya_LSD_Code

Layer-wise Semantic Dynamics (LSD) is a model-agnostic framework for hallucination detection in...

NishilBalar/Awesome-LVLM-Hallucination

up-to-date curated list of state-of-the-art Large vision language models hallucinations...

intuit/sac3

Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via...

HillZhang1999/llm-hallucination-survey

Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI...

Explore LLM Tools

All categories Trending LLM Tool directory Insights