PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
This project helps creators, marketers, and researchers develop advanced AI applications that understand and generate content across images, text, and video. You input various forms of media and textual descriptions, and it outputs things like generated images from text, controlled video animations, or extracted information from complex documents. Anyone looking to build or experiment with sophisticated multimodal AI models for creative or analytical tasks would find this useful.
718 stars.
Use this if you need to develop, fine-tune, or deploy AI models that combine understanding and generation across different types of data like images, text, and video, such as creating images from descriptions or extracting data from visual documents.
Not ideal if you are looking for a simple, off-the-shelf application without any development or technical configuration.
Stars
718
Forks
224
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 06, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/PaddlePaddle/PaddleMIX"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
UCSC-VLAA/story-iter
[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization
keivalya/mini-vla
a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to...
adobe-research/custom-diffusion
Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
byliutao/1Prompt1Story
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation...
zai-org/ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation