ictnlp/LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
This project offers a unified large multimodal model that efficiently processes and understands both images and videos. It takes visual inputs (still images or video clips) and provides detailed descriptions or answers to questions about the content. Researchers and developers working with large language models to analyze visual data will find this tool useful for high-performance applications.
562 stars. No commits in the last 6 months.
Use this if you need a highly efficient model to integrate visual understanding into your large language model applications, particularly for scenarios requiring fast processing of images and long videos with minimal computational resources.
Not ideal if you are looking for a simple, out-of-the-box application for everyday visual analysis without needing to integrate it into a larger language model system or manage computational resources.
Stars
562
Forks
30
Language
Python
License
Apache-2.0
Category
Last pushed
Jun 29, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ictnlp/LLaVA-Mini"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
zjunlp/EasyInstruct
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
NVlabs/Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies