google/crossmodal-3600
Crossmodal-3600 dataset
This is a dataset containing 3600 images paired with detailed text descriptions. It helps researchers and AI developers working on systems that need to understand and generate content across both visual and linguistic modalities. The input is a collection of images and their corresponding text, and the output is a resource for training and evaluating AI models.
No commits in the last 6 months.
Use this if you are developing or researching AI models that need to learn from and process both images and their related textual descriptions.
Not ideal if you need a dataset focused solely on a single modality like only images or only text, or if you require data from a highly specialized domain.
Stars
10
Forks
1
Language
HTML
License
—
Category
Last pushed
Jan 23, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/google/crossmodal-3600"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
KaiyangZhou/pytorch-vsumm-reinforce
Unsupervised video summarization with deep reinforcement learning (AAAI'18)