Nithin-GK/MaxFusion
[ECCV'24] MaxFusion: Plug & Play multimodal generation in text to image diffusion models
MaxFusion helps digital artists and content creators combine various creative influences to generate unique images. You provide descriptive text alongside other guiding inputs like sketches or reference images, and it outputs a high-quality image that blends all these elements. It's for anyone in creative fields who wants more control and flexibility in generating visual content.
No commits in the last 6 months.
Use this if you need to generate images that incorporate multiple, sometimes conflicting, creative inputs beyond just text prompts.
Not ideal if you only need basic text-to-image generation or are looking for a tool that requires extensive custom model training.
Stars
27
Forks
2
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Nov 02, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/Nithin-GK/MaxFusion"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
UCSC-VLAA/story-iter
[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization
PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks,...
keivalya/mini-vla
a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to...
adobe-research/custom-diffusion
Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
byliutao/1Prompt1Story
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation...