Django-Jiang/BadChain
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
This project helps AI safety researchers and red teamers evaluate the robustness of large language models (LLMs) against subtle attacks. It takes typical LLM prompts and intentionally crafted 'backdoor' demonstration examples, then shows how the LLM's reasoning process and final answers can be subtly manipulated when specific trigger phrases are present in a query. The ideal users are researchers focused on AI security, safety, and adversarial machine learning.
No commits in the last 6 months.
Use this if you need to understand and demonstrate how malicious actors could subtly implant 'backdoor' behaviors into large language models without access to their training data or internal parameters.
Not ideal if you are looking for a tool to improve the general performance or alignment of your LLM, or if you need to perform traditional fine-tuning or prompt engineering.
Stars
49
Forks
6
Language
Python
License
MIT
Category
Last pushed
Jul 24, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Django-Jiang/BadChain"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
langchain-ai/langchain-aws
Build LangChain Applications on AWS
brainlid/langchain
Elixir implementation of a LangChain style framework that lets Elixir projects integrate with...
langchain-ai/langchain-weaviate
🦜🔗 LangChain interface to Weaviate
langchain-ai/langchain-litellm
🦜🔗 LangChain interface to LiteLLM
langchain-ai/langchain-mongodb
Integrations between MongoDB, Atlas, LangChain, and LangGraph