RobustNLP/DeRTa

A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.

31
/ 100
Emerging

This project helps make large language models (LLMs) safer by training them to refuse harmful requests more effectively. It takes an existing LLM and specialized training data, then outputs a refined LLM that is better at identifying and declining unsafe prompts. This tool is designed for AI safety researchers and developers who are responsible for ensuring their LLM applications are secure and reliable.

No commits in the last 6 months.

Use this if you need to improve the safety of a Large Language Model, making it more robust at refusing to generate harmful or inappropriate content.

Not ideal if your primary goal is to enhance the model's performance on general tasks rather than its safety refusal capabilities.

AI Safety Large Language Models Content Moderation Model Refusal Responsible AI
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 4 / 25

How are scores calculated?

Stars

72

Forks

2

Language

Python

License

MIT

Last pushed

May 22, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/RobustNLP/DeRTa"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.