rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
MovieChat helps you efficiently understand the content of very long videos, like feature films or extended recordings, by processing them with significantly less computing power than traditional methods. It takes long video files as input and provides summaries or answers to questions about the video's content. This tool is ideal for researchers or developers working on video analysis, content moderation, or AI assistants that need to process extensive video data.
688 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to analyze extremely long videos, potentially thousands of frames, with limited GPU memory and resources.
Not ideal if your primary need is for real-time video processing of short clips or if you are not working with large-scale video understanding models.
Stars
688
Forks
43
Language
Python
License
BSD-3-Clause
Category
Last pushed
Jan 29, 2025
Commits (30d)
0
Dependencies
51
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/rese1f/MovieChat"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
zjunlp/EasyInstruct
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
NVlabs/Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding