aihao2000/DPN-LLaVA
Arxiv 25: Dynamic Pyramid Network for Efficient Multimodal Large Language Model
This project helps developers working with Multimodal Large Language Models (MLLMs) to make them run more efficiently. It takes an existing MLLM and image data, processing them to produce a faster MLLM that maintains its ability to understand fine-grained visual details. AI/ML engineers and researchers who are building or deploying MLLMs will find this useful.
Use this if you need to reduce the computational cost and improve the inference speed of your Multimodal Large Language Models without significantly sacrificing their detailed image understanding capabilities.
Not ideal if you are not working with Multimodal Large Language Models or if your primary concern is not model efficiency.
Stars
44
Forks
5
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 31, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/aihao2000/DPN-LLaVA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
zjunlp/EasyInstruct
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
NVlabs/Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies