shufangxun/LLaVA-MoD

[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

/ 100

Emerging

This project helps machine learning engineers and researchers create smaller, more efficient Multimodal Language Models (MLLMs) that can understand both images and text. It takes a large, powerful MLLM as input and distills its knowledge to produce a 'tiny' MLLM that performs exceptionally well with significantly fewer computational resources. This is ideal for those needing to deploy advanced vision-language AI in resource-constrained environments.

223 stars. No commits in the last 6 months.

Use this if you need to build powerful AI models that can interpret both images and text, but require them to be compact and run efficiently on limited hardware.

Not ideal if you primarily work with text-only or image-only AI models, or if computational resources are not a significant constraint for your deployments.

multimodal-AI AI-efficiency model-compression edge-AI-deployment computer-vision-language

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

223

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

scaleapi/llm-engine

Scale LLM Engine public repository

AGI-Arena/MARS

The official implementation of MARS: Unleashing the Power of Variance Reduction for Training Large Models

modelscope/easydistill

a toolkit on knowledge distillation for large language models

AGI-Edgerunners/LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient...

Wang-ML-Lab/bayesian-peft

Bayesian Low-Rank Adaptation of LLMs: BLoB [NeurIPS 2024] and TFB [NeurIPS 2025]

Explore Transformer Models

All categories Trending Transformer directory Insights