shufangxun/LLaVA-MoD

[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

38
/ 100
Emerging

This project helps machine learning engineers and researchers create smaller, more efficient Multimodal Language Models (MLLMs) that can understand both images and text. It takes a large, powerful MLLM as input and distills its knowledge to produce a 'tiny' MLLM that performs exceptionally well with significantly fewer computational resources. This is ideal for those needing to deploy advanced vision-language AI in resource-constrained environments.

223 stars. No commits in the last 6 months.

Use this if you need to build powerful AI models that can interpret both images and text, but require them to be compact and run efficiently on limited hardware.

Not ideal if you primarily work with text-only or image-only AI models, or if computational resources are not a significant constraint for your deployments.

multimodal-AI AI-efficiency model-compression edge-AI-deployment computer-vision-language
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

223

Forks

16

Language

Python

License

Apache-2.0

Last pushed

Mar 31, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/shufangxun/LLaVA-MoD"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.