PKU-YuanGroup/Video-Bench

A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!

23
/ 100
Experimental

This project provides a comprehensive way to assess how well large language models (LLMs) can understand and reason about video content. It takes various video datasets and associated question-and-answer pairs as input, then produces a systematic evaluation of how accurately different video-based LLMs perform. This is for researchers and developers who are building or improving LLMs specifically designed to interpret and make decisions based on video.

138 stars. No commits in the last 6 months.

Use this if you need to rigorously test and compare the capabilities of different video-based large language models across a range of understanding and decision-making tasks.

Not ideal if you are looking for an off-the-shelf video analysis tool for end-user applications rather than an LLM evaluation benchmark.

video-understanding large-language-models model-evaluation AI-benchmarking video-AI-development
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 5 / 25

How are scores calculated?

Stars

138

Forks

3

Language

Python

License

Last pushed

Dec 31, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/PKU-YuanGroup/Video-Bench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.