vbdi/divprune
[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
This project helps researchers and developers working with Large Multimodal Models (LMMs) to make them run more efficiently. It takes an existing LMM, like LLaVA, and prunes unnecessary visual tokens, resulting in a more streamlined model. This is for machine learning engineers and AI researchers who want to optimize the performance of their vision-language models.
Use this if you are developing or experimenting with large multimodal models and need to reduce their computational cost, memory footprint, or inference latency without significantly sacrificing performance.
Not ideal if you are an end-user looking for a pre-built application or a simple API to use LMMs, rather than modifying their core architecture.
Stars
71
Forks
2
Language
Python
License
—
Category
Last pushed
Dec 01, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/vbdi/divprune"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice