jingyaogong/minimind-v

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM！🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

/ 100

Established

This project helps AI developers and researchers quickly train their own specialized visual language models (VLMs) that can understand images and engage in conversation about them. It takes a base language model and image data to produce a compact VLM capable of image description and dialogue. This is ideal for those who want to build a small-scale, customizable visual AI without extensive resources.

6,712 stars. Actively maintained with 16 commits in the last 30 days.

Use this if you are an AI developer or researcher looking to experiment with training visual language models from scratch on a single GPU with minimal cost and time.

Not ideal if you need a ready-to-use, highly generalized visual language model for immediate deployment without any training or development work.

AI-model-training visual-language-processing multi-modal-AI custom-AI-development deep-learning-research

No Package No Dependents

Maintenance 17 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

6,712

Forks

736

Language

Python

License

Apache-2.0

Related tools

SkyworkAI/Skywork-R1V

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in...

roboflow/vision-ai-checkup

Take your LLM to the optometrist.

zai-org/GLM-TTS

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

NExT-GPT/NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

EvolvingLMMs-Lab/NEO

NEO Series: Native Vision-Language Models from First Principles

Explore LLM Tools

All categories Trending LLM Tool directory Insights