pipixin321/Awesome-Video-MLLMs
:fire: :fire: :fire: Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding :video_camera:
This is a curated list of research papers and open-source models focused on understanding video content using large language models. It provides a comprehensive overview of various approaches, including those for short, long, and streaming videos. Researchers and AI practitioners working on video analysis and multimodal AI will find this a valuable resource to discover and compare state-of-the-art techniques.
No commits in the last 6 months.
Use this if you are a researcher or AI engineer looking to explore, compare, and implement advanced methods for video understanding using multimodal large language models.
Not ideal if you are an end-user seeking a ready-to-use application for video analysis without involving deep technical implementation or research.
Stars
62
Forks
1
Language
—
License
—
Category
Last pushed
Sep 01, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/pipixin321/Awesome-Video-MLLMs"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
FoundationVision/Liquid
(Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators
Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification),...
Yangyi-Chen/Multimodal-AND-Large-Language-Models
Paper list about multimodal and large language models, only used to record papers I read in the...
thuml/AutoTimes
Official implementation for "AutoTimes: Autoregressive Time Series Forecasters via Large Language Models"