Wang-ML-Lab/multimodal-needle-in-a-haystack
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
This project provides a benchmark dataset and evaluation tools to test how well multimodal AI models can find a specific image within a very large collection of images. You input a vast array of images and a text description of a 'needle' image, and the system assesses if the AI model can correctly pinpoint that image's exact location. This is for AI researchers and engineers developing or evaluating advanced multimodal large language models.
Use this if you need to rigorously evaluate the long-context understanding capabilities of multimodal AI models, especially their ability to process many images and precise text instructions to locate specific visual content.
Not ideal if you are looking for an off-the-shelf application to search images, as this is a research benchmark for AI model developers, not an end-user search tool.
Stars
54
Forks
3
Language
Python
License
—
Category
Last pushed
Feb 22, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Wang-ML-Lab/multimodal-needle-in-a-haystack"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice