MME-Benchmarks/Video-MME

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

/ 100

Emerging

This project provides a comprehensive benchmark for evaluating how well Multi-modal Large Language Models (MLLMs) can understand and analyze video content. It takes various video types, along with their subtitles and audio, and outputs an assessment of the MLLM's ability to answer questions about the video. This is ideal for researchers and developers who are building or improving AI models that interpret video.

732 stars.

Use this if you need to rigorously test the video understanding capabilities of your AI model across diverse real-world scenarios and different video lengths.

Not ideal if you are looking for an off-the-shelf tool for video content analysis without developing or evaluating an AI model.

AI model evaluation video content analysis machine learning research multimodal AI large language model development

No License No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 12 / 25

How are scores calculated?

Stars

732

Forks

Language

—

License

—

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights