thu-nics/FrameFusion
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
FrameFusion helps make Large Vision-Language Models (LVLMs) process videos much faster and more efficiently. It takes a video as input and optimizes how the model processes it, allowing the LVLM to generate responses or analyses without significant delays. This tool is for researchers and developers working with video data and large language models, aiming to improve computational performance.
Use this if you are working with video data and Large Vision-Language Models and need to significantly speed up processing and reduce computational costs.
Not ideal if you are a general user looking for a ready-to-use application, as this requires technical setup within a development environment.
Stars
71
Forks
1
Language
Python
License
MIT
Category
Last pushed
Jan 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/thu-nics/FrameFusion"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
zai-org/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
zhaorw02/DeepMesh
[ICCV 2025] Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
YangLing0818/RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with...
Yushi-Hu/tifa
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
OpenMeshLab/MeshXL
[NeurIPS 2024] MeshXL: Neural Coordinate Field for Generative 3D Foundation Models, a 3D...