zjunlp/AutoSteer

[EMNLP 2025] AutoSteer: Automating Steering for Safe Multimodal Large Language Models

36
/ 100
Emerging

AutoSteer is a framework for ensuring multimodal large language models (MLLMs) produce safe and appropriate outputs. It takes an existing MLLM and training datasets containing potentially harmful images (like NSFW or violent content), then provides a way to train and integrate a 'steer matrix' and 'prober' to reduce harmful responses. This tool is for AI safety researchers or MLLM developers focused on mitigating bias and toxic output.

No commits in the last 6 months.

Use this if you are developing or deploying multimodal large language models and need to reduce the generation of harmful content during inference.

Not ideal if you are looking for a pre-packaged, zero-configuration solution for general-purpose MLLM safety without needing to interact with model internals or dataset preparation.

AI Safety Multimodal AI Harmful Content Detection Large Language Models Model Detoxification
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 5 / 25
Maturity 15 / 25
Community 14 / 25

How are scores calculated?

Stars

13

Forks

3

Language

Python

License

MIT

Last pushed

Aug 21, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/zjunlp/AutoSteer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.