TheShadow29/zsgnet-pytorch
Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural Language Queries (https://arxiv.org/abs/1908.07129)
This project helps computer vision researchers and AI developers train models that can identify specific objects within an image based on a natural language description, even if the model hasn't seen that exact object before. You input images and text queries, and it outputs the precise location (bounding box) of the described object in the image. This is designed for those building advanced computer vision systems for tasks like image search or intelligent assistance.
No commits in the last 6 months.
Use this if you are a computer vision researcher or AI developer working on models that need to locate objects in images based on descriptive text, especially for 'zero-shot' scenarios where the object might be novel.
Not ideal if you need a pre-trained, ready-to-use application for everyday image analysis and are not comfortable with machine learning model training and development.
Stars
72
Forks
12
Language
Python
License
MIT
Category
Last pushed
Apr 22, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/TheShadow29/zsgnet-pytorch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
microsoft/XPretrain
Multi-modality pre-training
TheShadow29/VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
zeyofu/BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can...
gicheonkang/sglkt-visdial
🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph...