MME-Benchmarks/Video-MME
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
This project provides a comprehensive benchmark for evaluating how well Multi-modal Large Language Models (MLLMs) can understand and analyze video content. It takes various video types, along with their subtitles and audio, and outputs an assessment of the MLLM's ability to answer questions about the video. This is ideal for researchers and developers who are building or improving AI models that interpret video.
732 stars.
Use this if you need to rigorously test the video understanding capabilities of your AI model across diverse real-world scenarios and different video lengths.
Not ideal if you are looking for an off-the-shelf tool for video content analysis without developing or evaluating an AI model.
Stars
732
Forks
27
Language
—
License
—
Category
Last pushed
Dec 08, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/MME-Benchmarks/Video-MME"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice