thu-nics/FrameFusion

[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"

38
/ 100
Emerging

FrameFusion helps make Large Vision-Language Models (LVLMs) process videos much faster and more efficiently. It takes a video as input and optimizes how the model processes it, allowing the LVLM to generate responses or analyses without significant delays. This tool is for researchers and developers working with video data and large language models, aiming to improve computational performance.

Use this if you are working with video data and Large Vision-Language Models and need to significantly speed up processing and reduce computational costs.

Not ideal if you are a general user looking for a ready-to-use application, as this requires technical setup within a development environment.

video-processing large-language-models computational-efficiency machine-learning-engineering AI-model-optimization
No Package No Dependents
Maintenance 10 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 3 / 25

How are scores calculated?

Stars

71

Forks

1

Language

Python

License

MIT

Last pushed

Jan 13, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/thu-nics/FrameFusion"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.