om-ai-lab/ZoomEye

[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration

35
/ 100
Emerging

This project helps anyone working with images that have many small, detailed elements. It takes an image and a question about its content, and then uses AI to 'zoom in' on different parts of the image, much like a human would, to find the answer. The output is a more accurate answer to your question, especially for images where details matter. This is for AI researchers and practitioners who build or use advanced vision-language models.

Use this if your current multimodal AI models struggle to accurately answer questions about images containing dense information or very fine-grained details.

Not ideal if you primarily work with simple images where the relevant information is easily visible without needing to 'zoom in', or if you are not building/evaluating advanced AI models.

AI research multimodal AI image analysis computer vision natural language processing
No License No Package No Dependents
Maintenance 6 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 12 / 25

How are scores calculated?

Stars

77

Forks

8

Language

Python

License

Last pushed

Nov 20, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/om-ai-lab/ZoomEye"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.