kuanghuei/SCAN
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
This project helps researchers and practitioners in computer vision and natural language processing to match images with relevant text descriptions, or vice versa. It takes a collection of images and corresponding text captions as input, and outputs models capable of accurately retrieving descriptions for a given image, or images for a given description. It is designed for those working with large datasets of multimedia content.
579 stars. No commits in the last 6 months.
Use this if you need to develop or evaluate state-of-the-art models for finding the most relevant images for a text query or the most relevant text description for an image.
Not ideal if you are looking for an out-of-the-box solution without deep learning expertise or are working with non-image/text data.
Stars
579
Forks
118
Language
Python
License
Apache-2.0
Category
Last pushed
May 18, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/kuanghuei/SCAN"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
KaiyangZhou/pytorch-vsumm-reinforce
Unsupervised video summarization with deep reinforcement learning (AAAI'18)
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch