Wang-ML-Lab/multimodal-needle-in-a-haystack

[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models

/ 100

Emerging

This project provides a benchmark dataset and evaluation tools to test how well multimodal AI models can find a specific image within a very large collection of images. You input a vast array of images and a text description of a 'needle' image, and the system assesses if the AI model can correctly pinpoint that image's exact location. This is for AI researchers and engineers developing or evaluating advanced multimodal large language models.

Use this if you need to rigorously evaluate the long-context understanding capabilities of multimodal AI models, especially their ability to process many images and precise text instructions to locate specific visual content.

Not ideal if you are looking for an off-the-shelf application to search images, as this is a research benchmark for AI model developers, not an end-user search tool.

multimodal-AI large-language-models AI-benchmarking image-retrieval model-evaluation

No License No Package No Dependents

Maintenance 10 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights