YangLing0818/RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
This project helps artists, designers, and marketers generate highly detailed and specific images from complex text descriptions. You input a detailed text prompt, and it outputs a high-resolution image that precisely matches your description, even with multiple objects and scenes. It's designed for anyone who needs to create accurate visual content from complex textual ideas.
1,843 stars. No commits in the last 6 months.
Use this if you need to generate images that perfectly match intricate, multi-part text descriptions, ensuring all elements and their relationships are accurately depicted.
Not ideal if you only need simple image generation from basic prompts or prefer not to integrate with powerful external multimodal AI models.
Stars
1,843
Forks
103
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Feb 01, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/YangLing0818/RPG-DiffusionMaster"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
zai-org/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
zhaorw02/DeepMesh
[ICCV 2025] Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
thu-nics/FrameFusion
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token...
Yushi-Hu/tifa
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
OpenMeshLab/MeshXL
[NeurIPS 2024] MeshXL: Neural Coordinate Field for Generative 3D Foundation Models, a 3D...