BAAI-DCAI/DataOptim
A collection of visual instruction tuning datasets.
This repository provides a streamlined collection of datasets to train Multimodal Large Language Models (MLLMs) more effectively. It takes various image and text question-answering datasets, standardizes them, and outputs ready-to-use training data in a format suitable for MLLM development. Machine learning engineers and researchers working on building or fine-tuning MLLMs would use this.
No commits in the last 6 months.
Use this if you are a machine learning engineer or researcher who needs pre-processed, high-quality visual instruction tuning datasets to train or fine-tune your Multimodal Large Language Models.
Not ideal if you are looking for a model to use directly, as this provides data for training models, not the models themselves.
Stars
77
Forks
3
Language
Python
License
MIT
Category
Last pushed
Mar 14, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/BAAI-DCAI/DataOptim"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.