thu-nics/FrameFusion

[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"

/ 100

Emerging

FrameFusion helps make Large Vision-Language Models (LVLMs) process videos much faster and more efficiently. It takes a video as input and optimizes how the model processes it, allowing the LVLM to generate responses or analyses without significant delays. This tool is for researchers and developers working with video data and large language models, aiming to improve computational performance.

Use this if you are working with video data and Large Vision-Language Models and need to significantly speed up processing and reduce computational costs.

Not ideal if you are a general user looking for a ready-to-use application, as this requires technical setup within a development environment.

video-processing large-language-models computational-efficiency machine-learning-engineering AI-model-optimization

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

zai-org/CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

zhaorw02/DeepMesh

[ICCV 2025] Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

YangLing0818/RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with...

Yushi-Hu/tifa

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

OpenMeshLab/MeshXL

[NeurIPS 2024] MeshXL: Neural Coordinate Field for Generative 3D Foundation Models, a 3D...

Explore Diffusion Models

All categories Trending Diffusion directory Insights