visresearch/LLaVA-STF

The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"

30
/ 100
Emerging

This project helps AI researchers and machine learning engineers working with large multimodal models (LMMs) to accelerate their inference speed. It takes existing LLaVA-1.5 models and processes their visual input more efficiently, resulting in faster visual-language reasoning without sacrificing accuracy. This is ideal for those developing and deploying LMMs where computational efficiency is critical.

No commits in the last 6 months.

Use this if you are a researcher or engineer looking to make your Large Multimodal Models (LMMs) like LLaVA-1.5 process visual information and provide answers significantly faster.

Not ideal if you are a general user looking for an off-the-shelf multimodal AI application; this is a development tool for improving existing LMMs.

multimodal-ai-development large-language-models ai-inference-optimization computer-vision-engineering ai-model-efficiency
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 7 / 25
Maturity 15 / 25
Community 6 / 25

How are scores calculated?

Stars

29

Forks

2

Language

Python

License

Last pushed

Jun 11, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/visresearch/LLaVA-STF"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.