OpenGVLab/InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

/ 100

Emerging

This project offers a family of open-source multimodal large language models (MLLMs) that can understand and respond to both images and text. You can input various types of visual content, like photos or diagrams, along with your text questions or commands, and receive detailed, intelligent textual responses. It's designed for AI developers, researchers, and engineers who build applications requiring advanced visual and linguistic comprehension.

9,879 stars. No commits in the last 6 months.

Use this if you are developing AI applications that need to process and reason about visual information and text together, and you require high-performing, open-source models comparable to leading commercial alternatives.

Not ideal if you are looking for a pre-built, consumer-ready application rather than a foundational model suite for development.

AI development multimodal AI large language models computer vision natural language processing

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

9,879

Forks

764

Language

Python

License

MIT

Higher-rated alternatives

jingyaogong/minimind-v

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM！🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

roboflow/vision-ai-checkup

Take your LLM to the optometrist.

SkyworkAI/Skywork-R1V

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in...

zai-org/GLM-TTS

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

NExT-GPT/NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

Explore LLM Tools

All categories Trending LLM Tool directory Insights