vbdi/divprune

[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

/ 100

Emerging

This project helps researchers and developers working with Large Multimodal Models (LMMs) to make them run more efficiently. It takes an existing LMM, like LLaVA, and prunes unnecessary visual tokens, resulting in a more streamlined model. This is for machine learning engineers and AI researchers who want to optimize the performance of their vision-language models.

Use this if you are developing or experimenting with large multimodal models and need to reduce their computational cost, memory footprint, or inference latency without significantly sacrificing performance.

Not ideal if you are an end-user looking for a pre-built application or a simple API to use LMMs, rather than modifying their core architecture.

large-multimodal-models model-optimization computer-vision natural-language-processing deep-learning-research

No Package No Dependents

Maintenance 6 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights