jingyaogong/minimind-v
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
This project helps AI developers and researchers quickly train their own specialized visual language models (VLMs) that can understand images and engage in conversation about them. It takes a base language model and image data to produce a compact VLM capable of image description and dialogue. This is ideal for those who want to build a small-scale, customizable visual AI without extensive resources.
6,712 stars. Actively maintained with 16 commits in the last 30 days.
Use this if you are an AI developer or researcher looking to experiment with training visual language models from scratch on a single GPU with minimal cost and time.
Not ideal if you need a ready-to-use, highly generalized visual language model for immediate deployment without any training or development work.
Stars
6,712
Forks
736
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 04, 2026
Commits (30d)
16
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jingyaogong/minimind-v"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
SkyworkAI/Skywork-R1V
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in...
roboflow/vision-ai-checkup
Take your LLM to the optometrist.
zai-org/GLM-TTS
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
NExT-GPT/NExT-GPT
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
EvolvingLMMs-Lab/NEO
NEO Series: Native Vision-Language Models from First Principles