zhiyichin/P4D

[ICML 2024] Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts (Official Pytorch Implementation)

/ 100

Emerging

This tool helps AI safety researchers and content moderation teams find prompts that bypass safety filters in text-to-image AI models like Stable Diffusion. It takes an existing dataset of 'safe' prompts and automatically generates new, problematic prompts that reveal vulnerabilities in a model's safety mechanisms. The output helps improve the robustness of generative AI systems against misuse, especially concerning copyrighted or NSFW content.

Use this if you are responsible for evaluating and hardening the safety mechanisms of text-to-image generative AI models against undesirable content generation.

Not ideal if you are looking to create new, high-quality images or enhance existing prompts for creative purposes.

AI safety content moderation generative AI red teaming model evaluation

No Package No Dependents

Maintenance 6 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

UCSC-VLAA/story-iter

[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization

PaddlePaddle/PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks,...

keivalya/mini-vla

a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to...

adobe-research/custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)

byliutao/1Prompt1Story

🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation...

Explore Diffusion Models

All categories Trending Diffusion directory Insights