RobustNLP/CipherChat

A framework to evaluate the generalization capability of safety alignment for LLMs

/ 100

Emerging

This framework helps AI safety researchers and developers evaluate how well large language models (LLMs) maintain their safety alignment when interacting with non-natural, encrypted language. It takes an LLM, a dataset, and chosen ciphers as input, then generates query-response pairs showing how the LLM behaves. This is for researchers and developers focused on LLM safety and robustness.

626 stars. No commits in the last 6 months.

Use this if you need to systematically test if an LLM's safety features can be bypassed when instructions or queries are disguised in ciphers rather than plain language.

Not ideal if you are looking to test general LLM performance or apply safety measures directly to user-facing applications without a research context.

AI Safety LLM Evaluation Adversarial Robustness Natural Language Processing Research

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

626

Forks

Language

Python

License

MIT

Higher-rated alternatives

posit-dev/chatlas

Your friendly guide to building LLM chat apps in Python with less effort and more clarity.

xming521/WeClone

🚀 One-stop solution for creating your AI twin from chat history 💡 Fine-tune LLMs with your chat...

ooyinet/WeClone

🚀从聊天记录创造数字分身的一站式解决方案💡 使用聊天记录微调大语言模型，让大模型有“那味儿”，并绑定到聊天机器人，实现自己的数字分身。数字克隆/数字分身/数字永生/LLM/聊天机器人/LoRA

vemonet/libre-chat

🦙 Free and Open Source Large Language Model (LLM) chatbot web UI and API. Self-hosted, offline...

qqqqqf-q/MirrorFlow

从对话数据到训练:数字分身 + 模型蒸馏 From Dialogue Data to Training Closed-Loop: Digital Twin + Model Distillation

Explore LLM Tools

All categories Trending LLM Tool directory Insights