Wang-ML-Lab/multimodal-needle-in-a-haystack

[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models

33
/ 100
Emerging

This project provides a benchmark dataset and evaluation tools to test how well multimodal AI models can find a specific image within a very large collection of images. You input a vast array of images and a text description of a 'needle' image, and the system assesses if the AI model can correctly pinpoint that image's exact location. This is for AI researchers and engineers developing or evaluating advanced multimodal large language models.

Use this if you need to rigorously evaluate the long-context understanding capabilities of multimodal AI models, especially their ability to process many images and precise text instructions to locate specific visual content.

Not ideal if you are looking for an off-the-shelf application to search images, as this is a research benchmark for AI model developers, not an end-user search tool.

multimodal-AI large-language-models AI-benchmarking image-retrieval model-evaluation
No License No Package No Dependents
Maintenance 10 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 7 / 25

How are scores calculated?

Stars

54

Forks

3

Language

Python

License

Last pushed

Feb 22, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Wang-ML-Lab/multimodal-needle-in-a-haystack"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.