zhiyichin/P4D

[ICML 2024] Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts (Official Pytorch Implementation)

33
/ 100
Emerging

This tool helps AI safety researchers and content moderation teams find prompts that bypass safety filters in text-to-image AI models like Stable Diffusion. It takes an existing dataset of 'safe' prompts and automatically generates new, problematic prompts that reveal vulnerabilities in a model's safety mechanisms. The output helps improve the robustness of generative AI systems against misuse, especially concerning copyrighted or NSFW content.

Use this if you are responsible for evaluating and hardening the safety mechanisms of text-to-image generative AI models against undesirable content generation.

Not ideal if you are looking to create new, high-quality images or enhance existing prompts for creative purposes.

AI safety content moderation generative AI red teaming model evaluation
No Package No Dependents
Maintenance 6 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 3 / 25

How are scores calculated?

Stars

52

Forks

1

Language

Python

License

MIT

Last pushed

Jan 11, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/zhiyichin/P4D"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.