X-PLUG/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
The mPLUG-Owl family of models helps developers build AI applications that understand both images and text. You can feed it various forms of multimedia, such as images, videos, and accompanying text, and it generates relevant textual responses or descriptions. AI engineers and researchers working on advanced multimodal systems would use this to power their applications.
2,540 stars. No commits in the last 6 months.
Use this if you are a developer looking to integrate advanced AI capabilities that can process and understand information from both images and text, including long sequences of images.
Not ideal if you are a non-technical user looking for a ready-to-use application, as this project provides foundational models for development rather than end-user software.
Stars
2,540
Forks
189
Language
Python
License
MIT
Category
Last pushed
Apr 02, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/X-PLUG/mPLUG-Owl"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
zjunlp/EasyInstruct
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
NVlabs/Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies