zeyofu/BLINK_Benchmark

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]

37
/ 100
Emerging

This project provides a benchmark to evaluate how well multimodal large language models (LLMs) can perform core visual perception tasks. It takes classic computer vision problems, like relative depth estimation or forensic detection, reformats them into multiple-choice questions with images, and then measures the accuracy of different LLMs. This is for researchers and developers working on improving the visual intelligence of multimodal AI models.

164 stars. No commits in the last 6 months.

Use this if you are a researcher or developer who wants to rigorously test and compare the visual perception capabilities of multimodal LLMs against human performance and other AI models.

Not ideal if you are looking for a tool to directly apply multimodal LLMs to solve real-world visual tasks, as this is an evaluation benchmark rather than an application.

multimodal-AI-evaluation computer-vision-benchmarking AI-perception-research LLM-visual-understanding
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 9 / 25

How are scores calculated?

Stars

164

Forks

8

Language

Python

License

Apache-2.0

Last pushed

Sep 27, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/zeyofu/BLINK_Benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.