Letian2003/MM_INF

An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08741.

/ 100

Experimental

This tool helps AI researchers and data scientists efficiently create high-quality, diverse multimodal instruction-following datasets. You provide images, and it automatically generates various instructions and corresponding responses, which are crucial for training advanced multimodal large language models (MLLMs). It automates much of the data synthesis process, allowing you to focus on model development.

No commits in the last 6 months.

Use this if you need to rapidly generate large, diverse datasets of image-based instructions and responses to train or fine-tune multimodal AI models, particularly when starting with only raw images.

Not ideal if you primarily work with text-only data, already have high-quality annotated multimodal datasets, or are looking for a simple API for existing MLLMs rather than a data generation pipeline.

AI-training-data multimodal-AI LLM-fine-tuning synthetic-data-generation computer-vision

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights