OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
This project offers a family of open-source multimodal large language models (MLLMs) that can understand and respond to both images and text. You can input various types of visual content, like photos or diagrams, along with your text questions or commands, and receive detailed, intelligent textual responses. It's designed for AI developers, researchers, and engineers who build applications requiring advanced visual and linguistic comprehension.
9,879 stars. No commits in the last 6 months.
Use this if you are developing AI applications that need to process and reason about visual information and text together, and you require high-performing, open-source models comparable to leading commercial alternatives.
Not ideal if you are looking for a pre-built, consumer-ready application rather than a foundational model suite for development.
Stars
9,879
Forks
764
Language
Python
License
MIT
Category
Last pushed
Sep 22, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/OpenGVLab/InternVL"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jingyaogong/minimind-v
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
roboflow/vision-ai-checkup
Take your LLM to the optometrist.
SkyworkAI/Skywork-R1V
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in...
zai-org/GLM-TTS
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
NExT-GPT/NExT-GPT
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model