rese1f/MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

/ 100

Emerging

MovieChat helps you efficiently understand the content of very long videos, like feature films or extended recordings, by processing them with significantly less computing power than traditional methods. It takes long video files as input and provides summaries or answers to questions about the video's content. This tool is ideal for researchers or developers working on video analysis, content moderation, or AI assistants that need to process extensive video data.

688 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to analyze extremely long videos, potentially thousands of frames, with limited GPU memory and resources.

Not ideal if your primary need is for real-time video processing of short clips or if you are not working with large-scale video understanding models.

video-analysis long-form-content AI-assistant-development multimodal-AI content-understanding

Stale 6m

Maintenance 0 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 14 / 25

How are scores calculated?

Stars

688

Forks

Language

Python

License

BSD-3-Clause

Higher-rated alternatives

TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

zjunlp/EasyInstruct

[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

NVlabs/Eagle

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

DAMO-NLP-SG/Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Explore Transformer Models

All categories Trending Transformer directory Insights