Letian2003/MM_INF
An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08741.
This tool helps AI researchers and data scientists efficiently create high-quality, diverse multimodal instruction-following datasets. You provide images, and it automatically generates various instructions and corresponding responses, which are crucial for training advanced multimodal large language models (MLLMs). It automates much of the data synthesis process, allowing you to focus on model development.
No commits in the last 6 months.
Use this if you need to rapidly generate large, diverse datasets of image-based instructions and responses to train or fine-tune multimodal AI models, particularly when starting with only raw images.
Not ideal if you primarily work with text-only data, already have high-quality annotated multimodal datasets, or are looking for a simple API for existing MLLMs rather than a data generation pipeline.
Stars
39
Forks
2
Language
Python
License
—
Category
Last pushed
Jun 04, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Letian2003/MM_INF"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice