Q-Future/Q-Bench

①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

36
/ 100
Emerging

This project provides a standardized way to test how well multi-modal Large Language Models (LLMs) understand and interpret visual content, especially focusing on 'low-level' image qualities like brightness, blurriness, or overall visual appeal. It takes images (single or pairs) and questions about their visual characteristics as input, then evaluates how accurately the LLM answers. This is useful for researchers and developers who are building or evaluating visual AI systems and need to rigorously assess their performance on fine-grained visual details.

282 stars. No commits in the last 6 months.

Use this if you are a researcher or engineer working with multi-modal LLMs and need a benchmark to evaluate their ability to perceive, describe, and assess image quality and other low-level visual attributes.

Not ideal if you are looking to apply LLMs for high-level image understanding tasks like object recognition or scene description without a focus on the underlying visual quality or detailed perception.

multi-modal AI evaluation image quality assessment computer vision benchmarking LLM visual understanding AI model performance
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 10 / 25

How are scores calculated?

Stars

282

Forks

13

Language

Jupyter Notebook

License

Last pushed

Aug 12, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Q-Future/Q-Bench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.