ictnlp/LLaVA-Mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

41
/ 100
Emerging

This project offers a unified large multimodal model that efficiently processes and understands both images and videos. It takes visual inputs (still images or video clips) and provides detailed descriptions or answers to questions about the content. Researchers and developers working with large language models to analyze visual data will find this tool useful for high-performance applications.

562 stars. No commits in the last 6 months.

Use this if you need a highly efficient model to integrate visual understanding into your large language model applications, particularly for scenarios requiring fast processing of images and long videos with minimal computational resources.

Not ideal if you are looking for a simple, out-of-the-box application for everyday visual analysis without needing to integrate it into a larger language model system or manage computational resources.

multimodal-AI computer-vision video-analysis image-understanding AI-development
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

562

Forks

30

Language

Python

License

Apache-2.0

Last pushed

Jun 29, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ictnlp/LLaVA-Mini"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.