visresearch/LLaVA-STF
The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"
This project helps AI researchers and machine learning engineers working with large multimodal models (LMMs) to accelerate their inference speed. It takes existing LLaVA-1.5 models and processes their visual input more efficiently, resulting in faster visual-language reasoning without sacrificing accuracy. This is ideal for those developing and deploying LMMs where computational efficiency is critical.
No commits in the last 6 months.
Use this if you are a researcher or engineer looking to make your Large Multimodal Models (LMMs) like LLaVA-1.5 process visual information and provide answers significantly faster.
Not ideal if you are a general user looking for an off-the-shelf multimodal AI application; this is a development tool for improving existing LMMs.
Stars
29
Forks
2
Language
Python
License
—
Category
Last pushed
Jun 11, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/visresearch/LLaVA-STF"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
zjunlp/EasyInstruct
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
NVlabs/Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies