Django-Jiang/BadChain

[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

/ 100

Emerging

This project helps AI safety researchers and red teamers evaluate the robustness of large language models (LLMs) against subtle attacks. It takes typical LLM prompts and intentionally crafted 'backdoor' demonstration examples, then shows how the LLM's reasoning process and final answers can be subtly manipulated when specific trigger phrases are present in a query. The ideal users are researchers focused on AI security, safety, and adversarial machine learning.

No commits in the last 6 months.

Use this if you need to understand and demonstrate how malicious actors could subtly implant 'backdoor' behaviors into large language models without access to their training data or internal parameters.

Not ideal if you are looking for a tool to improve the general performance or alignment of your LLM, or if you need to perform traditional fine-tuning or prompt engineering.

AI safety LLM security adversarial AI red teaming model interpretability

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

langchain-ai/langchain-aws

Build LangChain Applications on AWS

brainlid/langchain

Elixir implementation of a LangChain style framework that lets Elixir projects integrate with...

langchain-ai/langchain-weaviate

🦜🔗 LangChain interface to Weaviate

langchain-ai/langchain-litellm

🦜🔗 LangChain interface to LiteLLM

langchain-ai/langchain-mongodb

Integrations between MongoDB, Atlas, LangChain, and LangGraph

Explore LLM Tools

All categories Trending LLM Tool directory Insights