TheShadow29/zsgnet-pytorch

Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural Language Queries (https://arxiv.org/abs/1908.07129)

/ 100

Emerging

This project helps computer vision researchers and AI developers train models that can identify specific objects within an image based on a natural language description, even if the model hasn't seen that exact object before. You input images and text queries, and it outputs the precise location (bounding box) of the described object in the image. This is designed for those building advanced computer vision systems for tasks like image search or intelligent assistance.

No commits in the last 6 months.

Use this if you are a computer vision researcher or AI developer working on models that need to locate objects in images based on descriptive text, especially for 'zero-shot' scenarios where the object might be novel.

Not ideal if you need a pre-trained, ready-to-use application for everyday image analysis and are not comfortable with machine learning model training and development.

computer-vision natural-language-processing object-detection machine-learning-research image-understanding

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

TheShadow29/awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

microsoft/XPretrain

Multi-modality pre-training

TheShadow29/VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

zeyofu/BLINK_Benchmark

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can...

gicheonkang/sglkt-visdial

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph...

Explore NLP Tools

All categories Trending NLP directory Insights