zjunlp/AutoSteer

[EMNLP 2025] AutoSteer: Automating Steering for Safe Multimodal Large Language Models

/ 100

Emerging

AutoSteer is a framework for ensuring multimodal large language models (MLLMs) produce safe and appropriate outputs. It takes an existing MLLM and training datasets containing potentially harmful images (like NSFW or violent content), then provides a way to train and integrate a 'steer matrix' and 'prober' to reduce harmful responses. This tool is for AI safety researchers or MLLM developers focused on mitigating bias and toxic output.

No commits in the last 6 months.

Use this if you are developing or deploying multimodal large language models and need to reduce the generation of harmful content during inference.

Not ideal if you are looking for a pre-packaged, zero-configuration solution for general-purpose MLLM safety without needing to interact with model internals or dataset preparation.

AI Safety Multimodal AI Harmful Content Detection Large Language Models Model Detoxification

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 15 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

zjunlp/KnowledgeEditingPapers

Must-read Papers on Knowledge Editing for Large Language Models.

zjunlp/CaKE

[EMNLP 2025] Circuit-Aware Editing Enables Generalizable Knowledge Learners

zjunlp/unlearn

[ACL 2025] Knowledge Unlearning for Large Language Models

OFA-Sys/Ditto

A self-ailgnment method for role-play. Benchmark for role-play. Resources for "Large Language...

VinAIResearch/HPR

Householder Pseudo-Rotation: A Novel Approach to Activation Editing in LLMs with...

Explore LLM Tools

All categories Trending LLM Tool directory Insights