PaddlePaddle/PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

61
/ 100
Established

This project helps creators, marketers, and researchers develop advanced AI applications that understand and generate content across images, text, and video. You input various forms of media and textual descriptions, and it outputs things like generated images from text, controlled video animations, or extracted information from complex documents. Anyone looking to build or experiment with sophisticated multimodal AI models for creative or analytical tasks would find this useful.

718 stars.

Use this if you need to develop, fine-tune, or deploy AI models that combine understanding and generation across different types of data like images, text, and video, such as creating images from descriptions or extracting data from visual documents.

Not ideal if you are looking for a simple, off-the-shelf application without any development or technical configuration.

AI content creation multimodal AI video generation document understanding digital media production
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

718

Forks

224

Language

Python

License

Apache-2.0

Last pushed

Mar 06, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/PaddlePaddle/PaddleMIX"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.