EvolvingLMMs-Lab/LLaVA-OneVision-1.5
Fully Open Framework for Democratized Multimodal Training
This framework helps AI developers and researchers build and train advanced Large Multimodal Models (LMMs) that can understand both images and text. You input diverse image-text datasets, and it outputs highly performant LMMs capable of accurately interpreting visual information at its original resolution. This is for AI practitioners focused on creating cutting-edge multimodal AI applications.
762 stars.
Use this if you need to train your own state-of-the-art LMMs with superior visual understanding and want a cost-efficient, fully open-source framework.
Not ideal if you are looking to use a pre-trained LMM off-the-shelf without custom training or if you lack the technical expertise for model development.
Stars
762
Forks
61
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 27, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/EvolvingLMMs-Lab/LLaVA-OneVision-1.5"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jingyaogong/minimind-v
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
roboflow/vision-ai-checkup
Take your LLM to the optometrist.
SkyworkAI/Skywork-R1V
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in...
zai-org/GLM-TTS
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
NExT-GPT/NExT-GPT
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model