jingyaogong/minimind-v

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

63
/ 100
Established

This project helps AI developers and researchers quickly train their own specialized visual language models (VLMs) that can understand images and engage in conversation about them. It takes a base language model and image data to produce a compact VLM capable of image description and dialogue. This is ideal for those who want to build a small-scale, customizable visual AI without extensive resources.

6,712 stars. Actively maintained with 16 commits in the last 30 days.

Use this if you are an AI developer or researcher looking to experiment with training visual language models from scratch on a single GPU with minimal cost and time.

Not ideal if you need a ready-to-use, highly generalized visual language model for immediate deployment without any training or development work.

AI-model-training visual-language-processing multi-modal-AI custom-AI-development deep-learning-research
No Package No Dependents
Maintenance 17 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

6,712

Forks

736

Language

Python

License

Apache-2.0

Last pushed

Feb 04, 2026

Commits (30d)

16

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jingyaogong/minimind-v"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.