clairecyq/whos-waldo
Who's Waldo? Linking People Across Text and Images. ICCV 2021.
This project helps link individual people mentioned in news articles, historical texts, or social media posts to their corresponding faces in associated images. You provide a collection of texts with person mentions and images containing faces, and it outputs connections between specific names and specific faces. This tool is for researchers, archivists, or analysts working with multimedia content who need to resolve who is who across different data types.
No commits in the last 6 months.
Use this if you need to automatically identify which person described in a text corresponds to which face in an accompanying image.
Not ideal if you are looking for an out-of-the-box application with a graphical user interface, as this project requires technical setup and scripting.
Stars
13
Forks
4
Language
Python
License
MIT
Category
Last pushed
May 17, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/clairecyq/whos-waldo"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
peteanderson80/Matterport3DSimulator
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
daveredrum/ScanRefer
[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
TheShadow29/vognet-pytorch
[CVPR20] Video Object Grounding using Semantic Roles in Language Description...
jianghaojun/Awesome-3D-Vision-and-Language
A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D...