YangLing0818/RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)

/ 100

Emerging

This project helps artists, designers, and marketers generate highly detailed and specific images from complex text descriptions. You input a detailed text prompt, and it outputs a high-resolution image that precisely matches your description, even with multiple objects and scenes. It's designed for anyone who needs to create accurate visual content from complex textual ideas.

1,843 stars. No commits in the last 6 months.

Use this if you need to generate images that perfectly match intricate, multi-part text descriptions, ensuring all elements and their relationships are accurately depicted.

Not ideal if you only need simple image generation from basic prompts or prefer not to integrate with powerful external multimodal AI models.

digital art graphic design content creation visual storytelling marketing visuals

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

1,843

Forks

103

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

zai-org/CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

zhaorw02/DeepMesh

[ICCV 2025] Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

thu-nics/FrameFusion

[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token...

Yushi-Hu/tifa

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

OpenMeshLab/MeshXL

[NeurIPS 2024] MeshXL: Neural Coordinate Field for Generative 3D Foundation Models, a 3D...

Explore Diffusion Models

All categories Trending Diffusion directory Insights