RobustNLP/CipherChat

A framework to evaluate the generalization capability of safety alignment for LLMs

46
/ 100
Emerging

This framework helps AI safety researchers and developers evaluate how well large language models (LLMs) maintain their safety alignment when interacting with non-natural, encrypted language. It takes an LLM, a dataset, and chosen ciphers as input, then generates query-response pairs showing how the LLM behaves. This is for researchers and developers focused on LLM safety and robustness.

626 stars. No commits in the last 6 months.

Use this if you need to systematically test if an LLM's safety features can be bypassed when instructions or queries are disguised in ciphers rather than plain language.

Not ideal if you are looking to test general LLM performance or apply safety measures directly to user-facing applications without a research context.

AI Safety LLM Evaluation Adversarial Robustness Natural Language Processing Research
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

626

Forks

68

Language

Python

License

MIT

Last pushed

Oct 09, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/RobustNLP/CipherChat"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.