visresearch/LLaVA-STF

The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"

/ 100

Emerging

This project helps AI researchers and machine learning engineers working with large multimodal models (LMMs) to accelerate their inference speed. It takes existing LLaVA-1.5 models and processes their visual input more efficiently, resulting in faster visual-language reasoning without sacrificing accuracy. This is ideal for those developing and deploying LMMs where computational efficiency is critical.

No commits in the last 6 months.

Use this if you are a researcher or engineer looking to make your Large Multimodal Models (LMMs) like LLaVA-1.5 process visual information and provide answers significantly faster.

Not ideal if you are a general user looking for an off-the-shelf multimodal AI application; this is a development tool for improving existing LMMs.

multimodal-ai-development large-language-models ai-inference-optimization computer-vision-engineering ai-model-efficiency

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 15 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

zjunlp/EasyInstruct

[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.

rese1f/MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

NVlabs/Eagle

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Explore Transformer Models

All categories Trending Transformer directory Insights